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1 Introduction 


O^rinS this per.od we >=eS- the — ^ 

The .esuits wiU he exteapolaied 

to HDTV video sequences. 

Currently we have completed 

Ntthtir- r ll Thomaon Consur^er 

,„ the following section we give a brief overview of the 

section 3, we look at some coding results obtained ““"8 ^ 

compare these results to those obtained using the CCIl 1 H.^hl , 

In section 4, we evaluate these ADTV sylm could be 

fications and make some suggestions as to how th y 

implemented in the NASA network. . , • i 

A ^ TVip senuences we used for testing the Simula- 

Some caveats are in order he . S ..Kitme the MPEG algorithm- 

tor and generating the results are those used for ‘“‘‘"8 
The sequences are of much lower resolution h^ the “^V sequent 
be, and therefore the extrapolations are no y 

peit to get significantly higher -Xn”lT.The sirl^ir^^^^^ is a 

sequences that are of higher reso available they could be used 

vaM one, and should HDTV sequences become available, they com 

directly with the simulator. 

In addition to the work described in this report 

or were submitted during the past six months, descr g 

L grant and its predecessor. The papers are included m the appendix. 

• Y C Chen K. Sayood, and D.J. Nelsonr "> Robust Coding Scheme 
for Packet Video” JEEE Tram actions on Communications, vol. , PP- 
1491-1501, September 1992. 

. K Sayood, “Data Compression in Remote Sensing Applications," IEEE 

GeosLnc; and flemnl. Sensing 5dcie(p Newsletter, no. 84, pp. 7-15, 
September 1992. 


1 



• AC. Hadenfeldt and K. Sayood, /Compression of Color-Mapped Im- 
ages,” submitted to IEEE Transactions on Geoscience and Remote 

Sensing 

• K Sayood and S. Na, ‘“Recursively Indexed Differential Pulse Code 
Modulation,” Proceedings lEEE/DlMACS Workshop on Quantization 
and Coding, Piscataway NJ, November 1992. 

. K. Sayood, F. Liu, and J.D. Gibson,-A Joint Source Channel Coder 
Design,” accepted for 1993 International Conference on Communica- 
tion, Geneva, Switzerland, May 1993. 

• A.C. Hadenfeldt and K. Sayood, “Compresion of Color-Mapped Im- 
ages,” accepted for 1993 International Conference on Communication, 


Geneva, Switzerland, May 1993. 


• B Goriala K. Sayood, and G. Meempat, “An Image Compression 
Scheme for use in Token Ring Networks”, subrnitted to 
tional Conference on Communication, Geneva, Switzerlan , ay 


2 Advanced Digital Television 

There are three key elements in the ADTV system. 

• ADTV uses MPEG d-d- (Moving Pictures Expert Group) draft proposal 
as its compression scheme. 

• ADTV incorporates a Prioritized Data Transport(PDT) which is a cell 
relay-based data transport layer to supports the 

of video data. PDT also offers service flexibility and compatibility to 
broadband ISDN. 

• ADTV applies spectral-shaping techniques to Quadrature Amplitude 
Modulation(QAM) to minimize interference from and to any co-channe 

NTSC signals. 

We have simulated all aspects of the compression algorithm of ADTV 
proposal. The compression algorithm as described in the Advanced Digital 
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Television, System Description submitted to FCC/ACATS and as imple- 
mented in the simulator is described below. 


2.1 Compression Algorithm 


The basic compressioa approach is the MPEG++ algorithm which upgr^« 
the standard MPEG approach to HDTV performance level. The key com 
ponents of this algorithm are described below. 

2.1.1 Group of Pictures(GOP) 

A GOP comprises up to three types of frames, the I, P and B fratnes. The I 
LL are processed using only intra-frame DCT coder wrth 
tization; the P frames are processed using a hybrid temporal predictive DC 
coder with adaptive quantization and 

frames are processed using a hybrid temporal p 

adaptive quantization and bidirectional motion compensation The 1 an 
frames are referred to as the anchor frames because of their roles in the bidl 
rectional motion compensation of the B frames. The GOP structure offers a 
good tradeoff between the high efficiency of temporal predictive coding, goo 
error-concealment features of periodic intra-only processing, and fast picture 
acquisition. 

2.1.2 Input Sequencer 

The GOP data structure requires some unique sequencing the input video 
frames. Because of the backward motion compensation m B frames 
ing, the anchor frames must be processed before the B frames ^ 

the two anchors. The frames are transmitted in the same order as they a 

processed. 

2.1.3 Raster Line to Block/Macroblock Converter 

The basic DCT transform unit is an 8 x 8 pixel block called a block. The 
basic quantization unit is four adjacent blocks of Y, and one U and one 
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V blocks. Such a quantization unit is called a macroblock The converter 
converts the raster line format to the block and the macroblock format. 

2,1.4 I-frame Processing 

rr r • cl kvr an intra frame DCT coder without motion com- 

An I frame is processed by an intra trame . cn, * p i 

pensation. A fixed quantizer is applied to the DC coefficient. 

are first weighted by a down- loadable quantization matrix e ore un 

“JapSre quantization. The quantization step for the AC -fficrents^ 

trolled by a Rate Controller. The I frame coding is pretty much the same as 

JPEG scheme. 


2.1.5 P-frame Processing 

A P frame is Brst processed by forward motion compensation, motion is 
layjrferenced to'the nearest past anchor frame. The search 
portional to the number of B frames between two consecutrve anchor frarn . 
The prediction residue or original macroblock, depending on the ” 0 “™ 
pensation result, goes to DCT coder and quantizer. 

the DCT coefficient quantization IS identical to that used for 

rnotiomcompensated macroblocks, the DC and AC coefircrents are quantized 
with same uniform quantizer. 


2.1.6 B-frame Processing 

Unlike the P frames, the B frames are subjected to bidirectional motmn 
compensation. The motion references are the two anchor frames san wic mg 
the B frames. The search regions are proportional to the temporal dista 
between the B frame and the two anchor 

the B-frame macroblocks have a number of modes. In addition to all t 
modes for a P-frame macroblock, the B-frame macroblock further includes a 
bidirectional interpolative mode, using both forward and 

compensation, and a unidirectional mode. In the ^ ' macroblocks 

average of the forward and the backward motion- compensated macroblocks 
is used as the prediction macroblock. The B-frame macroblock is processed 
as a P-frame macroblock. 
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2.1.7 Differential, Run-Length and Variable-Length Coding(VLC) 


The quantized DC coefficients of all the 1-frame macroblock and?-, B-frame 
ra^roblocks in intra mode are coded with a DPCM coder. T>-e <,uant ed 
AC coefficients are coded with VLC after the zrgzag scan ordering. Motion 
tetor^Tm differentially coded. In addition, VLC is applied to all he coded 
information: motion vectors, macroblock addresses, bloc ypes, e 


2.2 Data Prioritization Layer 

The Prioritization Layer comprises the Priority Processor and the Rate Con 
troller. 


2.2.1 Priority Processor 

Based on the information from the Rate Controller, the Priority proces- 

for pre-calculates the rate of HP(High Priority)/LP(Low PnorRy) for every 

frame HP/LP fractional allocations may vary with the frame type. E e y 

dZ element gets a priority assignment from the Priority 

ing to its importance. The header is always most important, followed by 

the motion vector, DC value, low frequency coefficients and high frequ y 

roefficients. 


2.2.2 Rate Controller 

The Rate Controller monitors the status of the rate buffers in the Transport 
Encoder. It uses the buffer occupancy information to compute the neces y 
Impression requirement and feeds the results in the form of appropriate 
quantization parameters to the Video Processor in the Compression Encode^ 
The Rate Controller also provides input to the Priority Processor regar g 
the initial allocation of HP/LP rate for the next Group of Pictures. 

The algorithm used in this simulation for rate control is given by 
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where QS is the quantizer step-size, and 5 is a measure of buffer fullness. 


2.3 Transport Layer 

The Transport Layer comprises the Transport Processor and the Rate Buffer. 


2.3.1 Transport Processor 

Data elements are supplied to the Transport Processor from the Prioritization 
Processor. The Transport Processor generates appropriate header helds tor 
data group. The header fields are used in the construction of a basic transport 
unit called cell. A cell has a header and a trailer enclosing a payload area 
Each cell has a fixed size of 256 bytes long. The header contains chaining and 
segmentation information which allows data groups to be segmented across 
cells. This feature limits the propagation of channel error from one cell o 
the next. The trailer field contains 16-bit error-checking CRC code. 


2.3.2 Rate Buffer 

Since the number of cells generated is not constant and the channel coding 
module interfaces with the Transport Processor at a fixed clock rate, we need 
a buffer to smooth out any rate variation. The maximum end-to-end de ay 
is dependent on the size of the buffer. 


3 Simulation Results 

The ADTV system described above without the priority and transport pro- 
cessors was simulated in detail. The simulation programs were written in C 
and implemented on a SUN workstation. Along with the ADTV system we 
also simulated a video coding scheme based on the CCITT H.261 recommen- 
dations. The purpose was to have a benchmark for simulation. 

In our ADTV simulator, the frames were arranged in the following sequence 
IBBPBBPBBPBBIBB P... 
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The sequence used for testing the simulator was the Susie sequence. This se- 
quence contains both low and moderate motion of the type to be encountered 
in most NASA applications. We present the results m the form of graphs 
tables and a videotape accompanying this report. 

The coding rates and PSNR under different coding conditions are listed in 
Table 1. Two parameters are used to control the coding conditions, ihe 
parameter p controls the output rate and length of the rate buffer. e 
fullness of the rate buffer determines the quantizer step-size and therefore 
the coding rate and quality. Thus the parameter p ha^ an important impact 
on both coding rate and quality. The parameter t is used to decide whether 
the macroblock after motion compensation needs coding. Smal er values o 
will lead to higher rates, while smaller values of t will result in lower qua i y. 

The first three sequences were coded using the ADTV algorithm. In Science 
1 with p=3, the average rate is is 0.22 bits/pixel. This rate is not sufficient 
to effectively code the I frames. As the B and P frames depend heavi y on e 
I frames, this has a cascade effect on the entire sequence As the quantizer 
stepsize depends on how full the buffer is, this low rate leads to the buffer 
getting filled up as the lower portions of the I frame are being coded, 
means that when coding the lower regions of the I frames, the O^^ntizer 
is coarse. This results in blocking artifacts which are very noticeable m the 
lower portions of the sequence. This effect can be seen in the first sequence on 
the videotape. When the coder has finished with the I frame and the B an 
P frames are being coded, as the coding rate is lower for the B and P frames, 
the buffer situation gets partly remedied. However this is not su ° 

get the quantizer step-size small enough to remove the blocking effec . 
the buffer size is increased (p=5) the number of artifacts is reduced as seen 
in the second sequence on the video tape. An interesting effect can be seen in 
the third sequence. Here the buffer size was kept the same as ‘»^e second 
sequence, however, the motion compensation threshold t w^ kept high. T 
means that blocks that would have been coded m sequence 2 are left encoded 
in sequence 3. This in turn emphasizes the blocking effects. One would thin 
that given the fact that we are accepting more distortion, the rate wou go 
down However as we can see from Figure 1, the rates for sequence 2 and 
sequence 3 are almost identical. This could be attributable to the fact that 
the poor reconstruction of the P frames lead to poor prediction and hence 
an increase in bit rate which takes away any savings from the higher mo ion 
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compensation threshold. Thus the parameter t while it effects the quality 
has little effect on the rate. 

From Figure 1 we can see that the ADTV algorithm generates a very bursty 
traffic. This is in sharp contrast to the CCITT H.261 algorithm which pro- 
duces relatively smooth output. We can see this from the results tor sequences 
4-7 which were coded using the CCITT H.261 algorithm usmg the same pa- 
rameters p and I as the first three sequences. The rate P^NR resu 
tor these sequences are given in Table 1 and Figures 3 and . eca 
the only significant difference between the CCITT H.261 "‘^m and the 

ADTV algorithm are the sequencing and motion compensation 

r J- srrawsr fraTTip in the ADTV algorithm increases 

the intra-frame coding every Id trame in me ^ ^ & 

the bit rate. This is compensated for by using the different motion compe - 
sation approach giving the bursty traffic. In the CCITT H-261 
intra-frame coding is recommended once in every 134 frames, thus there is 
no significant variation in the rate from frame to frame The disadvantage 
of the CCITT H.261 algorithm when compared to the ADTV algorithm is 
the decrease in the ability to randomly access any particular ^ 

decrease in the ability to react fast to sudden scene changes. Furhtermore 
it should be noted that the ADTV algorithm was proposed for HD IV se- 
quences, while the sequences we are using have significantly less resolution. 

Due to the importance of I frame, which serves as the anchor frame for both 
P and B frame, we decide to put more coding efforts m such frame to try to 

eliminate the blocking effect. In Sequence 7-9 the ADTV ^ j 

modified to keep the quantization stepsize QS constant while coding the I 
frame. One effect is that the buffer becomes really full during coding 
frame, and the subsequent frame gets very little of the wding resources. is 

results in an increase in burstiness as can be seen from igures an . o 

ever this approach does result in the reduction /elimination of the blocki g 
effect. That such a simple strategy can result in such dramatic improyemen 
shows that should blocking effects appear in the HDTV sequences, atten ion 
should be paid to the encding of the I frames. 

Figures 7-14 show various comparison results between the ADTV, The mod- 
ified ADTV and the CCITT H.261 algorithms. While these comparisons 
show an advantage for the CCITT H.261 algorithm, subjective comparisons 
tend to show the reverse. We invite the reader to examine the videotape and 
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draw their own conclusions. 


4 HDTV and CCSDS 

To speculate how the HDTV service would be acconiodated by the NASA 
network, we briefly review some of the relevant features of the CCbUb rec- 
ommendations. 


4.1 CCSDS Principal Network 

A “CCSDS Principal Network” (CPN) serves as the project data handling 
network which provides end-to-end data flow in support of the Experimental, 
Observational and interactive users of Advanced Orbiting Systems A CPN 
consists of an “Onboard Network” in an orbiting segment connected through 
a CCSDS “Space Link Subnetwork” (SLS) either to a “Ground Network 
or to another Onboard Network in another orbiting segment. Ihe bLb is 
the central component of a CPN; it is unique to the space mission environ- 
ment and provides customized services and data communications protocols. 
Within the SLS, CCSDS defines a full protocol to achieve “cross support 
between agencies. Cross support is defined as the capability for one space 
agency to bidirectionally transfer another agency’s data between ground and 
space systems using its own transmission resources. A key feature of this pro- 
tocol is the concept of a “Virtual Channel” which allows one physical space 
channel to be shared among several data streams, each of them may have 
different service requirements. A single Physical space channel may there- 
fore be divided into several logical data channels, each known as a Virtual 

Channel. 

Eight separate services are provided within a CPN. Two 
(“Path” and “Internet”) operate end-to-end across the entire CPN. 1 ey are 
complementary services, which satisfy different user data cominunications 
requirements: some users will interface with only one of thern, but many will 
operate with both. The remaining six services ( “Encapsulatiori » 

plexing” “Bitstream”, “Insert”, “Virtual Channel Data Unit and Physica 

Channel”) are provided only within the Space Link Subnetwork for special 
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applications such as audio, video, high rate payloads, tape playback, and the 
intermediate transfer of Path and Internet data. Our interest is with the 
services provided by the Space Link Subnetwork. 

4.2 Space Link Subnet Services 

The Space Link Subnet supports the bidirectional transmission of data through 
the space/ground and space/space channels which interconnect the distributed 
elements of the CCSDS Advanced Orbiting Systems. It also provides direct 
connect” transmission services for certain types of data which requires timely 
or high-rate access to the space channel. During SLS transfer, different flows 
of data are separated into different Virtual Channels, based on data haridlmg 
requirements at the destination. These Virtual Channels are interleaved onto 
the physical channel as a serial symbol stream. A particular Virtual 
may contain either packetized or bitstream data, or a combination of both. 

The Space link Subnetwork consists of two layers; the Space Link layer and 
the Space Channel layer which correspond to ISO-equivalent Data Link layer 
and Physical layer respectively. Efficient use of the physical space channel 
was a primary driver in the development of these protocols. The Space Link 
layer is composed of the Virtual Channel Link Control sublayer(VCLC) and 
the Virtual Channel Access sublayer(VCA). 

The main function of VCLC sublayer is to convert incoming data into a 
protocol data unit which is suitable for transmission over the physical space 
channel Four type of protocol data units may be generated by the VCLC 
sublayer; fixed length blocks of CCSDS Packets, called “Multiplexing Pro- 
tocol Data Units” (M-PDUs); fixed length blocks of Bitstream data, called 
“Bitstream Protocol Data Units” (B-PDUs); fixed length blocks of mixed 
packetized and isochronous data, called “Insert Protocol Data Units (IN- 
PDUs); and fixed length blocks of data for use by retransmission control pro- 
cedures, called “Space Link ARQ Procedure Protocol Data Units” (SLAP- 
PDUs). 

There are several procedures in VCLC sublayer to perform the function. 
The Encapsulation procedure provides the flexibility to handle virtually any 
packet structure. It puts a primary header to delimited data units (including 
Internet packet) and make it to be a CCSDS Packet. Multiplexing procedure 
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multiplexes those CCSDS Packets on the same virtual channel together The 
length of multiplexing protocol data unit is hxed since it is required to ht 
exactly in the fixed length data space of VCDU/CVCDU. There maybe some 
packets which overlap two or more M-PDU and “first packet pointer points 
out where the first packet starts. Some user data, such as audio, video, 
playback and encrypted information, will simply be presented to the bLb ^ 
a stream of bits or octets. The Bitstream procedure simply blocks these data 
into individual Virtual Channels and transmits it. When the transmissiori 
rate is high. Bitstream data may be transmitted over a dedicated Virtual 
Channel. Alternatively, if the transmission rate is low, it can be inserted 
at the front of other packetized or bitstream data. This is called the Insert 
procedure. Through this procedure, bandwidth can be used more efficiently. 

The last procedure of VCLC is Space Link ARQ Procedure (SLAP) which is 
used to provide guaranteed Grade- 1 delivery of data links that interconnect 
the space and ground elements of a CPN. The SLAP-PDU carries Link 
ARQ Control Words” (LACWs) which report progress on receipt of data 
flowing in the opposite direction. Upon arrival at the receiving en , e 
LACW is extracted from the PDU, and the sequence number is checked to 
assure that no data has been lost or duplicated. In the event of a sequence 
error the LACW carried by PDUs traveling in the opposite direction is used 
to signal that a retransmission is required. This retransmission begins wit 
the first PDU that was not received in sequence, and all subsequent s 
are retransmitted in the order in which they were originally provided to the 
LSAP from the layer above. 

The VC A sublayer creates the protocol data units used for space link data 
transfer; these are either “Virtual Channel Data Units” (VCDUs) or “Coded 
Virtual Channel Data Units” (CVCDUs), and are formed by appending fixed 
length Header, Trailer and (for CVCDUs) error correction fields to the fixed 
length data units generated by the VCLC sublayer. The VGA sublayer 
is composed of Virtual Channel Access(VCA) and Physical Channel Ac- 
cess(PCA) procedures. VC A procedure generates VCDU for protocol da a 
units which come from VCLC sublayer or accepts independently generated 
VCDU from reliable users. A VCDU with a powerful 

correcting Reed-Solomon check symbols appended to it is called a CVGUU. 
Relative to a VCDU, a CVCDU contains more error-control inform^^^^^ 
hence less user data. “Virtual Channel ID” which is field of VCDU/CVCDU 
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Header can enable up to 64 VC to be run concurrently for each assigned 
Spacecraft ID on a particular physical space channel. Since space data is 
transmitted through weak signal, noisy channel as serial symbol str^m, a 
robust frame synchronization process at the receiving end is required. 1 here- 
fore fixed length VCDU/CVCDU is used and PC A procedure prefixes a 42 
bits'synchronization Marker in front of VCDU/CVCDU to form a “Channe 
Access Data Unit” (CADU). A contiguous and continuous stream of hxed 
length CADUs, known as a “Physical Channel Access Protocol Data Unit 
(PCA-PDU) is transmitted as individual channel symbols through the IbU- 
equivalent Physical Layer of the Space Link Subnet, which is known as the 
“Space Channel Layer”. 


4.3 Space Link “Grades of Service” 

Three different “grades of services” are provided by the Space Link Subnet, 
using a combination of error detection, error correction and retransmission 
control techniques. We have to note that each virtual channel can only 
support a single grade of service. 


4.3.1 Grade-3 Service 

This service provides the lowest quality of service. Data transmitted using 
Grade-3 service may be incomplete and there is a moderate probability that 
errors induced by the Space Link Subnet are present and that the sequence 
of data units is not preserved. A VCUD is discarded if an uncorrectable 
error is detected at the destination. Grade-3 service should not be used 
for transmission of asynchronous packetized data, because it provides insu - 
cient protection for the extensive control information contained in the packet 

headers. 


4.3.2 Grade-2 Service 

CVCDU is the unit of transmission that support Grade-2 service. The 
Solomon encoding provides extremely powerful error correction capabilities. 
Data transmitted using Grade-2 service may be incomplete, but data se- 
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quencing is preserved and there is a very high probability that no data errors 
have been induced by the Space Link Subnet. 


4.3.3 Grade-1 Service 

Data transmitted using Grade- 1 service are delivered through the Space Link 
Subnet complete, in sequence, without duplication and with a very high 
probability of containing no errors. It is provided by using two paired Reed- 
Solomon encoded Virtual Channels, in opposite directions, so that an Auto- 
matic Repeat Queueing (ARQ) retransmission scheme may be implemented. 


5 HDTV Transmission on the CCSDS Net- 
work 

As described in the previous section, some user data, such as audio, video, 
playback and encrypted information, can simply be presented to the Space 
Link Subnet (SLS) as a stream of bits or octets. The SLS merely blocks these 
data into individual Virtual Channels and transmits them using Bitstream 
Service. Some bitstream data, such as digitized video and audio, will have 
stringent delivery timing requirement and are known as “isochronous” data. 
For the transmission of ADTV coded information, the channel transmission 
rate is high enough to dedicate a specific Virtual Channel. Although the 
coding output rate is quite bursty, there are two mitigating circumstances 

1. the pattern of burstiness is relativey “uniform”. That is, the data rate 
peaks every 13"“ frame. 

2. the variations occur very fast, that is high traffic persists for only a 
single frame followed by low traffic. 

Because of (2) the traffic can be smoothed out using a moderate sized buffer, 
and (1) implies that the size of the buffer can be ascertained with some 

confidence. 

The delay constraints on the transmission preclude the use of the Space Link 
ARQ procedure, while the delay constraint coupled with the high rate argue 
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against the use of the Insert and Multiplexing procedures. This leaves the 
Bitstream procedure as the only viable candidate for HDTV transmission. 
This conclusion coincides with that of the CCSDS Red Book Audio, Video, 
and Still-Image Communication Services 

The Bitstream service fills the data field of the B-PDU (Bitstream- Protocol 
Data Unit) with the Bitstream data supplied at user’s request. Each B-PDU 
contains data for only one VC, identified by the VCDU-ID parameter. Each 
bit is placed sequentially, and unchanged, into the B-PDU data field. When 
the Bitstream data have filled one particular B- PDU, the continuation of 
the Bitstream data is placed in the next B-PDU on the same VC. Due to the 
delay constraints of the PDU release algorithm, if a B-PDU is not completely 
filled by Bitstream data at release time, some fill pattern has to be filled into 
the remainder of the B-PDU. 

As far as the grade of service is concerned, one could use the error protection 
service provided by the ADTV algorithm with grade 3 service, or discard any 
error protection from the ADTV signal and use grade-2 service. Given the 
sketchy amount of information available abouthforward error correction in 
the ADTV algorithm we would suggest the use of the Grade-2 service in the 
CCSDS recommendations. Some kind of forward error correction is impera- 
tive because of the need for data sequencing along with the general need for 
video integrity. Therefore, Grade-2 service which adopts Reed-Soloman en- 
coding is a logical choice. According to the minimum predicted performance 
of Grade-2 service, the probability that a Coded Virtual Channel Data Unit 
(CVCDU) will be missing is If we assume a CVCDU contains 8800 

bits of data, from our simulation, about 95 macroblocks of video data (for 
ADTV format, a frame is formed by 90Hx60V macroblocks) will get lost in 
a duration slightly over one and half hour. This shouldn’t hurt the quality 
too much in motion compensation scheme. The probability that a CVCDU 
contains an undetected bit error is 10”^^, only one bit error will occur in 
a transmission period over 11 hours. If this error bit occurs in video data, 
it won’t be easy to notice the degradation. But if the error bit occurs in 
control data, some degree of damage is inevitable. It may therefore be de- 
sirable to provide some more protection to the control data before it enters 
the network. We are still looking at this particular issue. 
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Table 1. Performance of coding rate and PSNR using ADTV and H.261 technique 

(Susie Sequence 150 frames) 



Coding Rate 
(Mbits/s) 

STDR* 

(Mbits/s) 

PAR** 

Average 

PSNR 

STDP* 

Sequence 1 

9.27 

2.12 

1.68 

35.14 

4.24 

Sequence 2 

15.34 

2.26 

1.33 

37.64 

4.52 

Sequence 3 

15.33 

2.31 

1.33 

37.67 

4.52 

Sequence 4 

9.39 

0.61 

1.65 

36.83 

0.81 

Sequence 5 

15.54 

0.48 

1.31 

39.05 

0.79 

Sequence 6 

15.54 

0.49 

1.31 

38.42 

0.49 

Sequence 7 

31.23 

8.17 

1.86 

41.31 

1.43 

Sequence 8 

17.27 

10.72 

3.37 

38.77 

1.68 

Sequence 9 

17.24 

10.86 

3.37 

38.85 

1.61 


* STDR : Standard deviation of coding rate 
STDP : Standard deviation of PSNR 

** PAR : Peak to average ratio 
frames/second : 29.97 

Act video pixels : (Luma) 1440Hx960V, (Chroma) 720Hx480V 

Sequence 1 : ADTV, p=3, t=l 

Sequence 2 : ADTV, p=5, t=l 

Sequence 3 : ADTV, p=5, t=3 

Sequence 4 : H.261, p=3, t=l 

Sequence 5 : H.261, p=5, t=l 

Sequence 6 : H.261, p=5, t=3 

Sequence 7 : ADTV, p=10, t=l, q=4 for intra-mode frame 
Sequence 8 : ADTV, p=5, t=l, q=4 for intra-mode frame 
Sequence 9 : ADTV, p=5, t=3, q=4 for intra-mode frame 




































Fig. 1 Coding Rate using ADTV Technique 
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Fig. 3 Coding Rate using H.261 
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Fig. 5 Coding Rate using Modified ADTV Technique 
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Fig. 7 Comparison of Coding Rate (p=5 t=l) 
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1 Introduction 

With the current and future availability of an increasing 
number of remote sensing instruments, the problem ot 
storage and transmission of large volumes ot data has be- 
come a significant and pressing concern. For example, the 
Hi“h-Resolution Imaging Spectrometer will acquire data at 
30 meter resolution in 192 spectral bands. This translates to a 
data rate of 280 Mbps! The Spaceborne Imaging Radar - C 
(SIR-C) will generate data at the rate of 45Mbps per channel 
w'ith four hiah data rate channels [1]. To accomodate this ex- 
plosion of data there is a critical need for data compression. 

One can view the utility of data compression in two different 
ways If the rate at which data is being generated exceeds the 
transmission resources, one can use data compression to 
reduce the amount of data to fit available capacity. Or given 
some fixed capacity, data compression permits the gathering 
of more information than could otherwise be accomodated. 

In this paper;.we provide a survey ot current data com- 
pression techniques w hich are being used to reduce the 
amount of data in remote sensing applications. The survey 
aspect of this paper is far from complete, refecting the sub- 
stantial activity in this area. The purpose ot the survey is 
more toe.xemplify the different approaches being taken 
rather than to provide an exhaustive list of the various 
proposed approaches. For more information on compression 
techniques the reader is referred to [2, 3, 4]. 

Compression techniques in remote sensing applications 
can be broadly classified into three (non distinct) categories. 
These are 

1. Classification/Clustering 

2. Lossless Compression 

3. Lossy Compression 

The rationale behind the classification approaches is that 
in a given dataset, the end user is generally interested in par- 
ticular features in the data. The 'dimensionality’ of these fea- 
tures is generally substantially less than the dimensionality 
of the data itself Thus, rather than transmitting the data in its 
entirety, if the features are extracted on-board and trans- 
mitted this can result in a significant amount of compression. 
Lossless compression techniques provide compression 
without any loss of information. That is, the raw data can be 
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exactly reconstructed from the compressed data. This is used 
when the data, or some subset of the data, is needed in exact 
form. In many cases, the data such as remotely sensed im- 
ages, will be viewed by a human (as opposed to a machine). 
In these cases, distortions which are not perceptually sig- 
nificant can be tolerated, and lossy compression, which en- 
tails the discarding of some of the information, can be used. 
The utility of this approach is closely related to the amount 
of distortion incurred and the importance of fidelity in the 
particular application. The classification approaches can be 
viewed as a form of lossy compression. The three ap- 
proaches are not mutually e.xclusive. For example, one may 
use classification as the first step with the feature vectors 
being losslessly encoded. 


2 Classification 

If we assume an image to be composed of a small number 
of objects, then the most efficient form of data compression 
is to assign each pixel in the image to one of the objects, and 
then simply transmit the object labels to the ground. This 
idea is behind several high compression schemes which at- 
tempt to classify the pixels based on different features, and 
then transmit the classification map. 

A technique called BLOB was introduced by Kauth et. al. 

[5] which uses proximity information along with spectral in- 
formation for unsupervised clustering. The use of proximity 
information allows for greater ease in the classification of 
boundary pixel values, which otherwise could be classified 
to a set different from the adjacent regions. BLOB would be 
most useful in situations where objects have relatively well 
defined boundaries. 

Another object oriented unsupervised classification 
scheme is described in [6]. They use what they call the path 
hypothesis for object classification. The path hypothesis as- 
sumes spatial contiguity, and spectral nearness for different 
pixels belonging to the same object. The spectral features of 
the different objects are then extracted and used to classify 
the object. They report an increase in classification accuracy 
along with a decrease in the amount of data required. 

Hilbert [7] proposed a more general clustering algorithm. 
He proposed dividing the data into blocks, and then cluster- 
ing them using an unsupervised procedure. The cluster 
centroids were then transmitted, along with a feature map 
describing the duster to which each block belonged. This ap- 
proach does not depend on the existence of well defined 
boundaries. Hilberts technique is a precursor to current day 
Vector Quantization algorithms which are discussed later. 

A common precursor to classification is the transforma- 
tion of the data using the Karhunen-Loeve Transform. The 
Karhunen-Loeve transform is used to linearly transform data 
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into uncorrelaled coordinates. This then makes the classifica- 
tion task easier, as the coordinates can be clustered in a 
multi-dimensional space, and then classified based on their 
location in this space. The rows of the KL transform matrix 
are the eigenvectors of the correlation matrix of the data. 

These vectors will often be related to physical parameters. 

For example in [8] the first and second eigenvectors cor- 
respond to the response of the dominant surface covers. 

Chen and Landgrebe [8] also show that it is sufticient to 
send only clipped (hard limited to -i- - 1) eigen functions 
along with only a fraction of the coefficients to obtain sig- 
nificant classification accuracy. They therefore propose the 
use of this scheme aboard the HIRIS instrument. 

3 Lossless Compression 

Lossless compression, as the name implies, consists of 
reduction in the amount of data without sacrii icing the 
fidelity of the data. The earliest known lossless compression 
technique of the technological age is probably the Morse 
code. In the Morse code, letters that occur often such as E 
are coded using short symbols, while letters that occur rela- 
tively infrequently such as Z, are represented by long sym- 
bols ( a single dot for E and dash dash dot dot for Z). This 
idea (albeit in more sophisticated form) is at the heart ot 
most lossless compression schemes. In 1948 Claude Shan- 
non defined the amount of information contained in the 

event X as log^ [9), where P(X) is the probability of 
r(X) 

the event X and a is the base of the logarithm. If o = 2 the 
unit of information is bits. If we detine X* to be the sequence 
of observations (Xq. Xi, . . . , X;,.i), then the entropy of the 
source generating the sequence is defined as 

H{S) = lim G, 

/I— 

where 

Shannon [9] showed that the minimum average rate at 
which the output of the source S can be encoded is H(S) 
bits/symbol. If the source outputs {X, } are independent then 
the expression for entropy reduces to 

H{S) = C, = X 

Given a sequence of independent observations, Huffman 
[10] developed an algorithm which provides a variable 
length code which gives an average coding rate R, where 
//(5) < ^ < H(S) + 1. The algorithm assigns shorter 
codewords to more probable symbols and longer codewords 


to less probable symbols a la Morse. Another technique 
which operates on sequences rather than individual letters is 
Arithmetic Coding. The Arithmetic coding algorithm guaran- 

2 

tees an average coding rate R where H{S) <R< H(S) + ^ 

being the length of the sequence. If the statistics of the se- 
quence change with time, these techniques will suffer some 
degradation. To combat this several adaptive coding techni- 
ques have been proposed including dynamic Huffman 
coding [1 1], adaptive aritlimetic coding [12] and the Rice al- 
gorithm [13]. The Rice algorithm has been shown to be op- 
timal under some widely available conditions [14], and has 
been implemented in a VLSI chip which can process 20 M- 
Bytes per second [15]. 

If the observations are not independent then the code 
designed using the first order probabilities P(X,) is only 
guaranteed to be within one bit of G\ which may be substan- 
tially greater than H{S). Because of this fact lossless com- 
pression consists of two steps; decorrelation, and coding. 

The first step can be seen as an ‘entropy reduction’ step in 
which the redundancy or correlation of the data is removed 
(reduced). This results in another sequence which has a first 
order entropy G| which can be significantly lower than the 
first order entropy of the original sequence. Now if a vari- 
able length code is designed using the first order prob- 
abilities of the decorrelated data, this will result in a lower 
rate/higher compression. Consider for example the following 
sequence 

1 2345432 12321 2345432345 

estimating the first order probabilities from the sequence we 
obtain 

/*[ 1 ] - P[5] = ~ ; ^^[2] = P[3] = ^ ; P[4] = ^ 

which gives a value for Ci of 2.25 bits/sample. It is obvious 
from looking at the data that it possesses some definite struc- 
ture. Some of this structure can be removed by storing con- 
secutive differences. The original data can be reconstructed 
(without loss) by simple addition. The difference data is 

niii._i-i_i^iii^i-iiiii-i-i--iiii 

The difference can be represented using a binary al- 
phabet, so the coding rate can immediately be lowered to 
one bit/sample. To see what the value of G{ is we first com- 

14 9 

pute the first order probabilities as P[l] = — , P[-l] = 

which gives an entropy of .96 bits/sample. In this particular 
case the gain of 0.04 bits per sample may not be worth the 
additional complexity required for a variable length code. 
Notice that in this case the compression was obtained mainly 
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due to the decorrelation step. Because of this, research in 
lossless compression is focusing more and more on 
development of better decorrelation algorithms. An idea of 
how much decorrelation gain is available can be obtained y 

lookine at the conditional entropy. 

In the example given above, the data was one dimen- 
sional so the prediction used to generate the difterenc 
residual data was also one dimensional. In the case of 
remotelv sensed images, the data is generally three dime - 
sional- two spatial dimensions and a spectral dimension^ n 
these cases, it would seem reasonable to use prediction based 
on all three dimensions. Chen et. al. [16] ^ ^ . 

theoretical advantages to be gained from using pi 

based on all three dimensions. They show that while there is 
some advantage to be gained trom using more than on 
dimensional prediction, the increase in 
However, if the increase in complexity of going from one 
two or three dimensions is acceptable (and it can be argu 
that the increase in complexity is minimal), it would seem 
I'asonable to use multi-dimensional prediction to decorrelatc 

"“a »n,ewha, dilferen, approad. is adopted b, Memon «. 
al 1 17 18] They reason that m an image the cone a ion 
niay b^ maximum in the vertical. 

direction depending on the object being imaged^ There ore 
one should L wlnchever pixel gives the most decoixelat on 
for prediction. They therefore develop the concept o pre c 
tion (or scanning) trees for performing the decorrelatiom The 
drawback with this approach when coding " 

that the cost of encoding the prediction tree may eat p y 
savings due to better decorrelation. In the case of mul 
snectral imaaes, because the same prediction tree can e 
used to code^'a larae number of bands, the relative cost of en- 
cig .he predicdon bee is small eaough no. .o overwhelm 

the savinss obtained via this approach [19]. 

all (ha. we have discussed above, we have taken a 
rather -eneral view of the lossless compression problem. 

Wto foeed with a speeif.c problem, one can often conre up 
with a simpler more efficient solution. Consider the probl 
rf ™odin^ the outpu. of a spectrometer. A general algor, .hm 
Ihe-Rice algorithm will do a nice job o encodmg he 
output of the spectrometer. However, given the very special 
structure of the data (the data looks like a noisy decaym^ 
ponential) one can come up with simpler techniques as m^^^^ 

[ 20 ] which are simpler and give better perfoi • ^ 

ly Steams et. al. [21] develop a lossless compression scheme 
tuned to the peculiarities of seismic data. When using ap- 
plication specific algorithms, the user should be aware of t 
fact that if the data sequence deviates from the assum 
Structure, this may result in performance loss. 

Finally, lossless coding can be used in conjunction 
other techniques. Several schemes in the literature use oss 
less compression as the second stage, where the irst is 
feature extraction or lossy compression [23. 23, J. 


4 Lossy Compression 

In many applications, loss of information which is not per- 
cppmall, JigmT,cam can b. easily lolem.ed. In fac. m c am 
cases, such as processing of SAR data [25], t le in 
lost may actually be the noise. In these cases, it makes sen 
to use lossy compression techniques which provide much 
hiaher compression than the lossless techniques. However, 
before we extoll the virtues of various lossy compression 
techniques, one should keep in mind the ' 

fully picking the distortion measure. Most of the 
Sion schemes described here use the mean squared error (or 
some variant) as the distortion measure. The mean squar 
error is detined as 


N 

1=1 


where ,v. is the original data value and .x, is the reconstructec 
(compressed and then decompressed) value. Note that this is 
an average measure therefore it will spread out the error ef- 
fects at any one location. Under this measure, a ” 

one sample value with no or little error in the other N-l 
sample values may be equivalent to small errors in all N 
sample values. If the application requires . 

value be represented within some tolerance, then the MSB 
probably not the distortion measure that should be used. 

4 / Quantization 

The heart (and sometimes the totality) of most lossy com- 
pression schemes is the quantization process. Quantization is 
Tn any to one mapping from a possibly infinite set to a fini e 
e? The input to the quantizer can be a scalar, in which case 
:Se quantiL is called a scalar quantizer, or a vector in which 
case the quantizer is called a vector ^ 

scalar quantizer is simply a concatenation of an A/D and a 
D/A. A simple A/D is shown in Figure 1. Assuming A , 
this A/D if the input falls in the range (0.1], the 
codeword 10, if the input falls in the range 
put is the codeword 00, and so on. The D/A takes the 
wdeword produced by the A/D and generates a real value 
corresponding to the interval represented by the codewo . 

In our simple example if the codeword 00 is ««ived the 
D/A will put out a value of -1.5. The input/output map for 
this quantizer (AJD-DIA combination) is shown in Figu . 
Figures I and 2 describe a two bit uniform quantizer. If th 
stepsize A is not constant for the different inten/als. the quan- 
tizer is called a non-uniform 

about the statistics of the input signal. Max [26] Lloy 
1271 have developed algorithms for the design of optimum 
uniform and non-uniform quantizers for . 

ces Kwok and Johnson [28] use a two bit quantizer designed 
for Gaussian data to code SAR data from the Magellan mis- 
sion. The SAR data is originally at 8-bit resolution, so 
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compression ratio is 4: 1 . To accomodate the rather large 
dynamic range of the S AR data, the quantizer is adapted on 
a block by block basis, using the average signal magnitude. 
The signal magnitudes in a block are used to compute a 
threshold value which is used in place of A in Figure 1. The 
output of the D/A are the optimum values for a Gaussian 
input with variance of one multiplied by the computed 

threshold. . . 

The SIR'C [1] uses 8 bit uniform quantization rollowea 

by a feature which allows it to reduce the number of bits per 
sample to facilitate the acquisition of more samples. Data 
compression thus allows the acquisition of more data at the 
cost of reduced resolution. 

In some cases it might be more efficient to quantize some 
function of the data rather than the data itself. Dubois et. al. 
[29] compress the output of an imaging radar polanmeter y 
first obtaining the Stokes matrix from the scattering 
matrices. Four Stokes matrices from contiguous pixels are 
added to form one four-look Stokes matrix. The elements of 
the four-look Stokes matrix are then quantized. The ad- 
vantage to this approach is that the elements of the Stokes 
matrix have certain well defined properties which can be 
used in the quantization process of the Stokes matrix. 


4.2 DPCM . . 

The relationship between the variance of the input to the 

quantizer and the MSE can be given by the following 

relationship, 
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Figure 4. DPCM coding of a one dimensional edge 


MSE=di ■ o,; 

where depends on the input probability density function, 
and R is the number of bits/sample. As can be seen from this 
expression, the MSE is proportional to the input sipal 
variance. Therefore, if we could reduce the input signal 
variance this would lead to a reduction in the MSE. (It 
should be noted that the operations to remove the redundan- 
cy could also change the input pdf which may diminish the 
benefits of a reduced variance.) This is the motivation for a 
class of lossy compression schemes known as Differential 
Pulse Code Modulation (DPCM) schemes. DPCM schemes 
remove redundancy in the source sequence by using the cor- 
relation in the source sequence to predict ahead. The 
predicted value is removed from the signal at the transmitter 
and reintroduced at the receiver. The predicnon error, which 
has a smaller variance than the input signal is then quantized 
and transmitted to the the receiver. A block diagram of a 
DPCM system is shown in Figure 3. This technique is used 
in the coding of the SPOT satellite’s panchromatic band. 

While DPCM coding performs well in quasi-stationary 
regions of an image, it does a poor job in edge regions. The 
reason for this is that the prediction in DPCM uses the pre- 
vious reconstructed pixels. In an edge region, the prediction 
error is quite large. Therefore, the input to the quantizer 
lands in one of the outer regions ((-^,-l],[l.“) in °nr ex- 
ample). The quantization error can therefore be quite large. 
This is fed back via the prediction process into the coding of 
the next pixel, and so on causing a smearing of the edges.^ 

This process is demonstrated on a one-dimensional edge in 
Figure 4. This problem can be overcome by using recursive- 
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Figure 5. Origif\ciI aerial vie\e of Omaha 



Figure 6 . DPCM coded Omaha image ai !A bpp 


ly indexed quanti/ation [30, 31 ] which avoids the large quan- 
tization eiTor problem by operating the quantizer in tw'O dif- 
ferent modes. Whenever the input to the quantizer falls in 
the external regions, the quantizer sw itches into a recursive 
mode, and the quantization eiTor is requantized until the 
error falls within some predetermined tolerance. This ap- 
proach not only prevents large quantization errors from 
propauatim: through the coded sequence, it also guarantees 
that \.hc enor /rer/n.vz/ w ill be less than a pre-determined 
value. To show how w ell this scheme w orks, we code the 
aerial view of Omaha -hew n in Figure 5. The compressed 
(and decompressed) image coded using the DPCM scheme 
described above at a rate of 1 .4 bits per pixel is shown m Fig- 
ure 6. Note that while there is an ov erall increase in 
‘blurriness' the distonion introduced does not blur the edges. 

While the DPCM structure remov es substantial amounts 
of the redundancy from the data stream, it should be remem- 
bered that the prediction process in the DPCM structure is 
linear, and can therefore remove only those redundancies 
which are expressed as linear processes. For example, a slow 
ly varying sequence 1 2 3 4 5 4 3 3 4 5 6 7 7 7 6 has redun- 
dancies that can be modeled by a linear process. However, 
we can easily come up with sequences that have redundan- 
cies that can not be characterized by a linear process such as 
4 24 15 19 4 24 15 19 This fact has been used by some to 
improve the data compression by making use of this redun- 
dancy for code selection [321, and by others for providing 
error protection [33). 


4.3 Vector Quantization 

Until now we have been talking about quantization as a 
scalar process, however, the basic idea of quantization can 
easily be extended to the vector case. Scalar quantization can 
be viewed as a partition of the real number line, with the 


A/D doing the partitioning, and the D/A prov iding a repre- 
sentative value for each partition. Similarly, vector quantiza- 
tion can be seen as a partitioning of multidimensional space. 
While conceptually the problems of scalar and vector quan- 
tization approaches are very similar, the pr.actical problem of 
designing vector quantizers is signilicantly more dilticult. 

Two somewhat different approaches hav e been taken 
towards the design of vector quantizers. The first is a cluster- 
ing approach similar to the Hilbert technique [7]. In this ap- 
proach [34], a training sequence is used to identity the 
regions in multi-dimensional space where the data seems to 
cluster. The quantizer outputs are the centroids ot these 
clusters, and the partitions are the nearest neighbor partitions 
of these centroids. An example of a two dimensional vector 
quantizer is shown in Figure 7. The VQ in Figure 7 contains 
4 output levels, or codewords. Thus the size of each 
codeword is two bits. But each output level corresponds to 
the coding of two input samples, therefore, the number of 
bits per sample is one. In general, given the dimension of the 
vector d and the number of bits per sample /?, the size of the 
vector codebook is Notice, that this means an exponen- 
tial increase in the size of the codebook with dimensionality 
and rate. For example, given = 1 2 and /? = 2, the size of the 
codebook would be 2'" = 16777216! This represents an enor- 
mous expense in storage and computing resources. Thus the 
raie-dimension product provides a limitation on the clustered 
VQ designs. Fortunately, a lot can be done at low rate-dimen- 
sion products. For more moderate rate-dimension products a 
number of somewhat more structured VQ algorithms have 
been developed [35]. Chang et. al. [25] report the use of a 
tree-structured VQ on Seasat SAR imagery with favorable 
results. As the codebook of the VQ is obtained by training, it 
is important that the data in the training set be representative 
of the data in the test set. If this is not the case, there can be 
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Figure 7. A four leveL iwo dimensional veaor quantizer 

sisnificant degradation in the data that is not typical of the 
training set 125). The Omaha image of Figure 5 is coded 
using a clustered VQ at 0.5 bits per pixel. The result is 
shown in Figure 8. The VQ dimension was 16 (4X4 blocks) 
and this is evident from the coded image in Figure 8. where 
there is a noticeable amount of blockiness. The blocks that 
lie on the edges of objects in the image clearly distort the 
ed 2 es. The VQ codebook was obtained using another aerial 
imase. We can improve the performance of this algorithm 
by increasing the rate and/or by generating the codebook 
from an imaue which more closely resembles the image 
being coded? In Figure 9 we have the Omaha image coded at 
I birper pixel using a codebook generated using the Omaha 
image itself. There is substantial improv ement in the quality, 
though there is still some distortion in the lower quarter of 


Figure 8, VQ coded Omaha image at 0.5 bpf) 



the picture. It should be noted tluu the use ot the iniuge to 
generate the codebook is generally not realistic. 

Vector Quantization is also used by Gupta and Gersho for 
the coding of Landsat TM images [361. They use a vector 
DPCM system with vector quantization in the spatial 
domain, and predictive encoding in the spectral domain. A 
variation of predictive VQ is also used by Giusto [37] for the 
compression of multispectral images. 

The rate-dimension product constraint on vector quan- 
tizers can be lifted by making the vector quantizer more and 
more structured. Of course, as the VQ acquires more and 
more structure of its ow'ii, it is less and less responsive to 
structure in the data. The most structured vector quantizers 
are those based on a multi-dimensional lattice [38]. While 
these quantizers do an excellent job ot quantization, they can- 
not at the same time perform the redundancy removal opera- 
tion performed by the clustered VQs. They therefoie have to 
be used in conjunction wnth other techniques to provide com- 
pression [39, 40]. 

4.4 rran5form Coding 

Most of the techniques w e have talked about operate in 
the data domain, i.e. without any transtormation. Theie is a 
large class of compression techniques that operate on a trans- 
formed version of the data. They are called transtorm coding 
techniques. The idea behind transform coding is to transform 
the data in such a w ay as to compact most of the energy (and 
information) into a few coefficients. These coefficients can 
then be coded, while other coefficients can be discarded 
thereby achieving data compression. The most efficient trans- 
form from the compaction point of view is the Karhunen- 
Loeve [2j transform. Howe\ er. the Karhunen-Loeve 
transform is data dependent w hich makes it impractical tor 
most compression applications. The best alternative to the 


Figure 9. VQ coded Omaha image at I.O bpp 
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Figiif'^ 10. Zig zog 


Karhunen Loeve transform is the Discrete Cosine 
Transform (DCT). This is a real, separable, unitary mtnsform 
that is the basis for an image compression standaid [4 !• 
Because of its popularity in image coiupression various last 
algorithms have been proposed for its implementation [ , 

43 ] 

The ES\ Huygens Titan Probe to be launched by the Cas- 
sini Orbiter will use the DCT for compressing the image data 
acquired during its descent through Titan s ^ ^ 

imaaes of size 256X256 will be divided into 8X8 blocks^ 
These blocks will be transformed and the transform cmefti- 
cients reordered using the zigzag ordering shown in Figure 
10 The ordered coefficients will then be blocked into subs r- 
inas of four coefficients each. Substrings with ^'1 ^"f ‘ 

values below a specified threshold will be deleted while the 
remainder will be quantized using scalar quantizeis. Detai 
can be found in [44, 45], 


To see the artifacts introduced by DCT coding we have 
coded the Omaha image at 0,5 bits per pixel and 1 bit per 
pixel as shown in Figures 1 1 and 12. Note the substantial 
block artifacts in Figure 1 1 which have been reduce to a 
large extent in Figure 12. However even in Figure 12 one 
can see siunificant distortion in edge regions. 

An adaptive version of DCT was also considered by 
Chano et. al. [251 for the compression of Seasat SAR im- 
aaery^They compare the DCT technique with a VQ techni- 
que and decide in favor of the VQ technique based on 
complexity issues. With the wide acceptance ot the DC as 
an image compression standard, the complexity issue may 
no lonscr be relevant, as more and more maiuitacturers are 
bringing hardware implementations of the DCT to the 
market. 


5 Conclusions 

As can be seen from this discussion, there is a substantial 
amount of on-going activity in the area of data compression 
for remote sensing applications. This will only increase as 
there is more and more need for data compression. However, 
ther are several areas of research which have not been ad- 

dressed in any significam way. 

There is a need for the development of better distortion 
measures which can be then used to develop more sophisti- 
cated compression algorithms. It is possible that rather than a 
single distortion measure, a set of distortion measures wi e 
needed for different applications. The development o sue 
measures, and alsorithms utilizing these measures, require 
close cooperation between data compression specialists and 
the scientists and engineers who are the end-users ot the data 

obtained through remote sensing. 

The multi-dimensional (spatial and spectral) nature o e 
data has not really been thoroughly explored (except in the 
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classification approaches). With the development and 
deployment of high spectral resolution instruments, this par- 
ticular aspect of remotely sensed data will become more im- 
portant. Compression schemes which take 
fact need to be developed. An analogy could be drawn with 
the development of compression algorithms for video as op- 
posed to still images. However, the algorithms developed for 
video cannot be directly applied to high spectral resolution 
image data sets, as the differences that occur between trames 
of a video sequence are not the same as the differences that 
occur between different spectral images. It would seem that 
VQ approaches such as [36] would provide possible solu- 
tions The rate-dimension constraints in clustering Q co 
be avoided by the use of Lattice VQ techniques. Another ap- 
proach described in [46] is to use a two step strategy, m 
which the first step is used to model the data m the spec ral 
direction. The resulting models are then treated as a vector 
image for compression in the spatial directions. Beyond th 
however, there is a need for the development of three dimen- 
sional approaches, both to model the data, and develop co 
pression algorithms. 
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Abstract 

In a smdard image coding scenario, pixel-to-pixd conelation nearly always exists in dte 
lata. especiaUy if the image is a natuml scene. This comelaUon is what allows predictive coding 
ichemes (e.|.. DPCM) to perform efficient compression. In a color-mapped image, the values 
Stored in the pixel array are no longer direcUy related to the pixel intensity. TVo color Indices 
which are numericaUy adjacent (close) may point to two ve^t different colom. The correlanon 
stiff exis«. but only via tire colonnap. TOs fact can be exploited by soriing the color map to 
reintroduce the structure. In this paper we study the sorting of colonnaps and show how the 
smicmre can be used in both lossless and lossy compression of images. 

• -mis wort was supponed by the NASA Goddard Space Bight Center (NAG 5-1612) and the NASA 
Lewis Research Center (NAG 3-806). 
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1 Introduction 

Many lower-cost image display systems use color-mapped (or pseudo-color) displays. Wlule 
there has been considerable attention devoted to the compression of monochrome and fuU-color 
images, the compression of color-mapped images has not received similar attention. 

The human eye can distinguish hundreds of thousands of different colors in a color space, 
depending on viewing conditions [1]. A full-color (also called true-color) frame buffer provides 
a means of displaying this wide range. Such a system is iUustrated in Figure 1. 



Figure I FuU'Color Frsimc Buffer 

Many applications of digital images benedt from, or require, color capabilities to be effective. 
If a fuU-coior display is used, an application may become too cosdy to implement ptacdcally. 
Also, the images involved require large amounts of storage space, whether in display memory or 
on a mass-storage device. A less expensive solution is needed. 

These applications naturaUy lead to the pseudo-color or color-mapped frame buffer, shown in 
Figure 2. This type of display is typical of those found on peraonal computera and workstations. 
A smaller amount of image memory is required, one-third that of the foU-color system example. 
The values stored in memory are used as indices into a 24-bit table, the colotmap. 


1 


Compression of Color-mapped Images — A. C. Hadenfeldt 


Each entry in the colormap consists of 8-bit values for the red, green, and blue portions of 
the pixel. These three values are then passed through DACs to the red. green, and blue election 
guns of the CRT. as with fhU-color system. The color-mapped system allows the display of a 
smaU number of colors at a time. 2* for the system shown in the figure, which can be selected 
from a larger set of colors (2^^ for this example). By careful selection of the colors in the 
colormap, a large variety of images can be displayed, often with quality approaching that of a 

full-color display system. 



Figure 2 Color-Mapped Frame Buffer 

The use of the colormap. however, disguises the spatial structure in the image. An indication 
of this can be obtained by calculating the zero-th and first order entropies of the image. These 
quantities were computed using the index arrays for the four test images shown in Figure 3. and 

are listed in Table 1. 
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Tabic 1 Entropies of the Source Images 


Image 

Ho 

Hi 

Lena 

7.617 

1M3 

Park 

lAlO 

7.797 

Omaha 

7.242 

7.165 

Lincoln 

5.916 

6.674 


The large values of Hi in the table verify that the spatial correlation in the image pixel 
values has been reduced by the color-mapping process. The values of Ho arc also relatively 
large, a direct result of selecting an 8-bit colormap. The data compression due to the color 
quantization process implies that the color index values stored in the image are more cntical 
than similar values in. for example, an achromatic image. To further verify this, an experiment 
was conducted in which errors were introduced in the least significant bit of a color-mapped 
image, similar to what might be encountered if the color index values were quantized. The 
resultant images were of poor subjective quality at best, and often completely unrecognizable. 
Since quantization is a part of many popular source coding schemes, the available choices for 
compression schemes become limited. 


2 Colormap Sorting 

In the previous section, several problems unique to color-mapped images were discussed. 
The root of these problems is that the colonnap indices stored in the image have little relationship 
with each other, which complicates coding for progressive transmission. In this section, methods 
of restoring this relationship are discussed. 

Colormap sorting is a combinatorial optimization problem. Treating the K colormap entnes 
as vectors, the problem is defined as follows. Given a set of vectors {ox.aj, in a three- 

dimensional vector space and a distance measure d{i,j) defined between any two vectors a; and 
aj, find an ordering function L{k) which minimizes the total distance D: 

D=^d(I(fc), I(fc-bl)) 
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The ordcnng function L is constrained to be a peimutadon of tlic sequence of intcgeis {1 K )• 

Another possibility resuits when the list of coloratap entries is considered as a ring slnicture. That 
is, the colonnap entry specified by L(K) is now considered to be adjacent to the entry specified 
by L(l). in this case, an additional term of d{L{K),m) »> 

The sotting problem is similar to the weU-known travelling salesman problem, and is identical 
if the colonnap is considered as a ring structure. As such, the problem is known to be NP- 
complete C31, and the number of possible orderings to consider is l/2[(/i: - IjilW- Algorithms 
exist which can solve the problem exacUy [4)151; however, these algorithms are computationaUy 
feasible only for K no greater than about 20. Efficient algorithms for locating a local minimum 
exist (4) for K < 145. For large colomiaps such as ff = 256, another approach is necessary. 
Two techniques were tested. The first is a "greedy" technique, discussed in Section 2.1. The 
second is an algorithm which has performed weU in practice, known as simulated aunealiug. 
Simulated annealing was chosen as the soning method for the colormaps in this wortt, and is 
described in more detail in Section 2.2. 


The distance metric i was chosen to be (unweighted) Euclidean distance, and different color 
spaces were investigated. Three coior spaces were selected: the NTSC RGB space, the CIE 
L*a*b* space, and the CIE L*u*v* space. The NTSC ROB space was chosen since it corresponds 
to the eolor primaries of the original images. Color spaces which can be linearly transfomied to 
the NTSC RGB space were not considered, since the use of an unweighted Euclidean distance 
measure would give similar results for such a color space. The two CIE color spaces were 
selected since they provide a means to measure percepmal color differences. 


2.1 Sorting Using Simulated Annealing 

Simulated annealing [3][6] is a stochastic technique for combinatorial mmimization. The 
basis for the technique comes from thermodynamics and observations concerning the properties 
of materials as they are cooled. The technique described in this section is based on the 

implementation in [6]. 
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To iUustrate this concept, consider an iron block. At high temperatures, the iron molecules 
move freely with respect to each other. If the block is quenched (cooled very quickly), the 
molecules will be locked together in a high-energy state. On the other hand, if the block is 
annealed (cooled very slowly), the molecules wiU tend to redistribute themselves as they lose 
energy, with the result being a lower energy lattice which is much stronger. The distribution of 
molecular energies is characterized by the Boltzmann distribution: 

P{E) ~ 

where E is the energy state. T is the temperature, and k is Boltzmann’s constant. The significance 
of this distribution is that even at low temperatures, there is some probability that a molecule 
wiU have a high energy. In a combinatorial optimization situation, the Boltzmann distribution 
can be used to temporarily allow increases in the cost function, while still generally striving to 

achieve a minimum. 

Solving the colormap sorting problem involves selecting each color only once while 
minimizing the sum of the distances between the colors. To find a solution using simulated 
annealing, an initial path through the nodes (colors) is chosen, and its cost computed. The 
algorithm then proceeds as follows: 

1. Select an initial temperature T and a cooling factor a. 

2 Choose a temporary new path by perturbing the current path (see below), and compute 

■ the change in path cost. = E^.^ - E^- If < 0 . accept the new path. 

3 If A£ > 0 randomly decide whether or not to accept the path. Generate a random 

’ number r from a uniform distribution in the range [0, 1). and accept the new path 

if r < exp(-AjB/T). 

4 Continue to perturb the path at the current temperature for I iterations. Then, "cool 

' the system by the cooling factor: Tnew = Continue iterating using the new 

temperature. 

5. Terminate the algorithm when no path changes are accepted at a particular temper- 
ature. 
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The dccision-meking praeess is known as ihe Metropolis algorithm. Note that the decision 
process wiU allow some changes to the path which increase its cost. This makes it possible for 
the simnlated annealing method to avoid easily being trapped in a local minimum of the cost 
luncUon. Hence, the algorithm is less sensitive to the initial path choice. 

For the images of this work, initial values of T ranged from 80 to 500. depending on *e 
color space used. The cooling factor o was usuaUy chosen as 0.9. The simulated annealing 
algorithm seemed to be most sensitive to the choice of this value, as values outside the range 
[0 85 0 951 caused the cooling to occur too slowly or too quickly. The number of iterations 
per mmperature / was chosen as 100 times dte number of nodes (colors), or 25.600. However, 
to improve the execution speed of the algorithm an improvement suggested by (61 was added, 
which causes the algorithm to proceed to the next temperature if (lOXnumber of nodes) = 2560 
successful path changes are made at a given temperature. 

Also, a method for perturbing the path must be selected. In this work, the perturtrations 
wem made using the suggestions of Lin [41[61. At each iteration, one of two possible changes 
to Ihe path are made, chosen at random. The first is a path transport, which removes a segment 
of the current path and reinserts it at another point in the path. The location of the segment, 
its length, and the new insertion point are chosen at random. The second perturbation method. 
caUed path reversal, removes a segment of the current path and reinserts it at the same point 
in the path, but with Ihe nodes in reverse order. The location and length of the segment are 

again randomly chosen. 

The algorithm oudined in the previous paragraphs fomiulates colonnap somng as a travellmg 
salesman problem. This type of preblem usuaUy assumes a complete tour will be made (i.e.. 
the salesman desires to return to the original city). Hence, the colormap is assumed to have a 
ring-like structure. However, the simulated annealing technique can also be used it this is not 
flte case, allowing Ihe colormap to be considenkl as a linear list structure. Experiments using 

both structures were conducted. 
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3 Colormap Sorting and Lossless Compression 

The results of sorting the colormaps of the test images using simulated annealing arc shown 
in the foUowing tables. Table 2 shows results for sotting the colormap as a circular ring structure, 
whUe Table 3 shows the results of sorting the colormap as a linear structure. Given in the tables 
are values for the resulting first-order entropy and the final path cost (the distance measure D). 


Table 2 Resultant Images With Circularly Sorted Colormaps 


Image 

Name 

RGB Space 

L*a*b Space 

L*u*v* 

Space 

Cost 

Ey 

Cost 

Hi 

Cost 

Hi 

Lena 

Park ' 

Omaha 

Lincoln 

13.88 

19.32 

11.04 

10.62 

i 5.641 
6.325 
6.209 
5.513 

857.80 

1609.46 

1081.82 

1193.88 

1 5.627 

6.330 
6.303 
5.831 

208.49 
310.41 1 

363.21 
224.06 

5.480 

6.218 

6.178 

5.478 


Table 3 

Resultant Images With Linearly Sorted C 

blormaps 

Image 

Name 

RGB Space 

L*a*b Space 

L*u*v* i 

Space 

Cost 

Hi 

Cost 

Hi 

Cost 

Hi 

Lena 

Park 

Omaha 

Lincoln 

11.68 

15.66 

10.81 

10.61 

5.575 

6.260 

6.532 

5.774 

847.29 

1509.25 

1004.69 

1177.80 

5.933 

6.115 

6.554 

6.120 

200.31 

292.29 

283.66 

204.64 

, 5.512 

6.546 
6.199 
5.735 


Note that the zero-order entropy Hq is not changed by the sorting process, since permuting 
the colormap entries docs not change the frequency of occurrence of a particular color. The lower 
first-order entropies of the resultant images indicate that some of the spatial correlation between 
color indices has been restored in each case. The sorting results for the NTSC RGB space show 
that sorting in this space yields good results, if entropy reduction (the first goal stated above) is 
the goal. However, the L*u*v* space sorting gives better results, with the added advantage that 
the perceptual differences between colormap entries has been considered. Hence, the resultant 
images from this sort should also be able to accept quantization errors whUe maintaining good 
subjective quality, the second goal stated previously. We examine this further in the next section. 
In terms of lossless compression, the sorting has resulted in a drop of 2 bits per pixel for the 
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Lena image a.d Uo 1.5 bits per pi«l for the other images. For a 512x512 image Utis tnmsla.es 
to a savings of between 32,768 to 65,536 bytes per image. For a large database of images this 
could be a considerable saving. 


4 Colormap Sorting and Lossy Compression 

The soning of the coioitnap resides some perceptual suucture to the colomiap indices in the 
sense Oia. indices close in numerical value are also close in some perceptual sense. Therefore it 
shouid be possible to introduce erroro into the indices without destroying the image. To verify this 
hypothesis, we dropped the three least significant bits of the L*u*v*-soned Park images. Good 
subjective results were obtained using quantization levels down to as low as 5 bils/pixel from the 
8-bi. original. Figure 4 shows the colomap for the Parte image, before and after sorting. The 
sorted colomiap shown was sorted as a linear list in L*u*v* space. Figure 5 shows the result of 
quannzing the Parte image to 5 bits/pixel, before and after the colomiap has been sorted, A caveat 
is in Older here. While Ihc distance between the eight-bit indices have mote perceptual meaning, 
die sorted colormap image should not be assumed to have the same properties as an eight- 
bit monochrome image. In some cases, if the distance between the original and reconstructed 
(compressed and decompressed) indices is large enough, there might be a drastic change in color 
between those pixels in the original and reconstiucted image. In the monochrome case large 
distances would correspond to changes in shading which might be overlooked by the viewer. 

To see how weU the sorted color-mapped images lend themselves to lossy compression we 
compress diem using particular implemenmdons of two popular lossy compression techniques, 
the Discrete Cosine Transferal (DCD and Differential Pulse Code Modulation (DPCM). 

4.1 DCT Coding of Color-mapped Images 

In Figure 6 we coded the Lena image with the unsorted colormap at two bits per pixel using 
the unsortcd color map. As can be seen from the figure, the original image is totally lost and all 
that remains is seemingly random colors. It should be noted that for eight-bit monochrome images. 
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DCr coding at two bits per pixel generaUy provides a reconstruction which is indistinguishable 
from the original. 

In Figure 7 wc show the same image, this time with the sotted color map. coded at two bits 
per pixel with the fixed bit allocation. [7] The images in Figure 8 were coded at two and one 
bit per pixel using the JPEG [8] algorithm.* Note that while the image coded using Uic fixed bit 
allocation shown in Figure 7 is far superior to the image in Figure 6 there are still quite a few 
annoying artifacts. This is because of the nonadaptive nature of the algorithm which, while it 
minimizes the average error, may permit the introduction of large errors in individual blocks. As 
the color-mapped images are particularly sensitive to large errors, this could account for the low 
quality reproduction. The JPEG algorithm adapts its bit allocation on a block-by-block basis. 
Therefore, the image in Figure 8(b) which is coded at half the rate of the image in Figure 7 
still provides superior quality. 

4.2 DPCM Coding of Color-mapped Images 

Standard DPCM coding of color-mapped images is problematic because in the busy regions 
of images, especially edges, the prediction error is generally large, leading to large overload 
noise values. In monochrome images these noise values result in a blurred look around edges, 
which may be acceptable for certain application. However, in color-mapped images these noise 
values will result in splotches of different colors. The Edge Preserving DPCM (EPDPCM) system 
avoids this problem by the use of a recursively indexed quantizer [9,10]. in which the magnitude 
of the quantizauon error is always bounded by f . This attribute makes it ideal for application 
to the coding of color-mapped images. Another advantage of the EPDPCM system is that, as 
the quantizer output alphabet can be kept smaU without incurring overload error, the output is 
amenable to entropy coding. 

Results using the EPDPCM system are shown in Figure 10. The image in Figure 10(a) was 
coded at a rate of 2 bits per pixel, while the image in Figure 10(b) was coded with 1.35 bpp. 


• The JPEG coded images were coded using software from the independent JPEG foundation. 
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The advantage of DPCM systems over transform coding systems is their low complexity 
and higher speed. However, the reconstniction quality obtained using transform coding systems 
is generaUy significantly higher than that of DPCM systems at a given rate. Comparing Figure 
10(a) and 8(a), this is obviously not the case for the sorted colormapped images. In fact, the 
quality of the two-bit EPDPCM coded image is actually somewhat higher than the two-bit DCT 
coded image. Thus using the EPDPCM system provides advantages both in terms of complexity 
and speed, and reconstruction quality. 


5 Conclusion 

In this paper we have shown that use of sorted colotmaps makes color-mapped images 
amenable to both lossless and lossy compression. For lossy compression conventional wisdom 
dictates the use of DCT coding for most types of images. However, for color-mapped images 
DPCM coding might be more advantageous. 


6 References 

[1] Foley, J.D., van Dam, A., S.K. Feiner, and J.F. Hughes, Computer Graphics: Principles 
and Practice (Second Edition), Reading. MA: Addison-Wesley, 1990. 

[2] Graphics Interchange Format (GIF) Specification, CompuServe. Inc., Columbus. OH, 
June 1987. 

[3] Aaits, E., and J. Korst, Simulated Annealing and Boltzmann Machines, New York: John 
Wiley and Sons, 1989. 

[4] Lin, S., “Computer Solutions of the Traveling Salesman Problem,” Bell System Technical 
Journal, pp. 2245-2269, December 1965. 

[5] Bellman. R.E., and S.E. Dreyfus. Applied Dynamic Programming, Princeton, NJ: 
Princeton University Press, 1962. 


10 


Compression of Color-mapped Images — A.C. Hadenfeldt 


[6] Press, W.H.. B.P. Hannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in 
C, New York: Cambridge University Press, 1988. 

[7] Jayant, N.S. and P. NoU, Digital Coding of Waveforms, Prentice-HaU, 1984. 

[8] Wallace, G.K., “The JPEG still picture compression standard,” Communications of the 
ACM, 34(4);31-44, April 1991. 

[9] Rost, M.C. and K. Sayood, “An Edge Preserving Differential Image Coding Scheme,” 
IEEE Transactions on Image Processing. 1:250-256, April 1992. 

[10] Sayood, K. and S. Na. “Recursively Indexed Quantization of Memoryless Sources,” 
IEEE Transactions on Information Theory, IT-38, November 1992. 


11 



Compression of Color-mapped Images — A.C. Hadenfeldt 


List of Figures 


Figure 1. Full-Color Frame Buffer 
Figure 2. Color-Mapped Frame Buffer 
Figure 3. Test Images 

Figure 4. Colormap for Park Image (a) before and (b) after sorting 

Figure 5. Park image quantized to five bits per pixel with (a) unsorted and (b) sorted 
colormaps 

Figure 6. Lena image with unsorted colormap coded at two bits per pixel using JPEG 
DCT algorithm 

Figure 7. Lena image coded at two bits per pixel using DCT with fixed bit allocation 

Figure 8. Lena image coded at (a) two bits per pixel and (b) one bit per pixel using 
JPEG algorithm 

Figure 9. DPCM structure 

Figure 10. Lena image coded at (a) 2 bits per pixel and (b) 1.35 bits per pixel using 
EPDPCM 


12 



I'Lcjure 3. Test Imacjcs 


MtttOEDINQ PAGE BLANK NOT FtLMED 











Figure 6. Lena image with unsorted colormap coded at two bits per 
pixel using JPEG DOT algorithm 




•IEEE TRANSACTIONS ON COMMUNICATIONS. VOL 40, NO. 9, SEPTEMBER 1992 


preceding page blank not 

A/ 5 

A Robust Coding Scheme for Packet Video 
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Abstract — We present a layered packet video coding algorithm 
based on a progressive transmission scheme. The algorithm 
provides good compression and can handle significant packet 
loss with graceful degradation in the reconstruction sequence. 
Simulation results for various conditions are presented. 

I. Introduction 

D ue to the rapid evolution in the fields of image process- 
ing and networking, video information will be an impor- 
tant part of tomorrow’s telecommunication system. Up to now, 
video transmission has been mainly transported over circuit- 
switched networks. It is quite likely that packet-switched 
networks will dominate the communications world in the 
near future. Asynchronous transfer mode (ATM) techniques in 
broadband-lSDN can provide a flexible, independent and high- 
performance environment for video communication. There- 
fore, it is necessary to develop techniques for video trans- 
mission over such networks. 

The classic approach in circuit switching is to provide 
a “dedicated path,” thus reserving a continuous bandwidth 
capacity in advance. Any unused bandwidth capacity on the 
allocated circuit is therefore wasted. Rapidly varying signals, 
like video signals, require too much bandwidth to be ac- 
commodated by a standard circuit-switching channel. With 
a certain amount of capacity assigned to a given source, 
if the output rate of that source is larger than the channel 
capacity, quality will be degraded. If the generating rate is 
less than the available capacity, the excess channel capacity is 
wasted. The use of packet networks allows for the utilization 
of channel sharing protocols between independent sources and 
can improve channel utilization. Another point that strongly 
favors packet-switched networks is the possibility that the 
integration of services in a network will be facilitated if all 
of the signals are separated into packets with the same format. 

Some coding schemes which support packet video have 
been explored. Verbiest and Pinnoo proposed a DPCM-based 
system which is comprised of an intrafield/interframe predic- 
tor, a nonlinear quantizer, and a variable length coder [1]. Their 
codec obtains stable picture quality by switching between three 
different coding modes: intrafield DPCM, interframe DPCM, 
and no replenishment. Ghanbari has simulated a two-layer 
conditional replenishment codec with a first layer based on 
hybrid DCT-DPCM and second layer using DPCM [2]. This 
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scheme generates two type of packets: “guaranteed packets 
contain vital information and “enhancement packets” contain 
“add-on” information. Darragh and Baker presented a sub- 
band codec which attains a user-prescribed fidelity by allowing 
the encoder’s compression rate to vary [3]. The codec’s design 
is based on an algorithm that allocates distortion among 
the sub-bands to minimize channel entropy. Kishino et ai 
describe a layered coding technique using discrete cosine 
transform coding, which is suitable for packet loss compen- 
sation [4]. Karlsson and Vertterli presented a sub-band coder 
using DPCM with a nonuniform quantizer followed by run- 
length coding for baseband and PCM with run-length coding 
for nonbaseband [5]. In this paper, a different coding scheme 
based on a progressive transmission scheme called Mixture 
Block Coding with Progressive Transmission (MBCPT) [6], 

[7] is investigated. Unlike the methods mentioned above, 
MBCPT does not use decimation and interpolation filters to 
separate the signals into sub-bands. However, it does have the 
attractive property of dealing separately with high frequency 
and low frequency information. This separation is obtained by 
the use of variable blocksize transform coding. 

This paper is organized as follows. First, some of the 
important characteristics and requirements of packet video are 
discussed. In Section III, the coding scheme called mixture 
block coding with progressive transmission (MBCPT) is pre- 
sented. In Section IV, a network simulator used in testing the 
scheme is introduced. In Section V the simulation results are 
discussed. Finally, in Section VI the paper is summarized. 

II. Characteristics of Packet Video 

The demand for various services, such as telemetry, terminal 
and computer connections, voice communications, and full- 
motion high-resolution video, along with the wide range 
of bit rates and holding times they represent, provides an 
impetus for building a Broadband Integrated Service Digital 
Network (B-ISDN). B-ISDN is a projected worldwide public 
telecommunications network that will service a wide range 
of user needs. The continuing advances in the technology 
of optical fiber transmission and integrated circuit fabrication 
have been driving forces to realize B-ISDN. The idea of 
B-ISDN is to build a complete end-to-end switched digital 
telecommunication network with broadband channels. Still to 
be precisely defined by CCITT, with fiber transmission, H4 
has an access rate of about 135 Mbps. 

Packet-switched networks have the unique characteristics of 
dynamic bandwidth allocation for transmission and switching 
resources, and the elimination of channel structure. They 
acquire and release bandwidth as needed. Because the video 
signals vary greatly in bandwidth requirement, it is attractive 
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to Utilize a packet-switched network for video coded signals. 
Allowing the transmission rate to vary, video coding based 
on packet transmission permits the possibility of keeping 
the picture quality constant, by implementing '‘bandwidth on 
demand.” There are three main merits when transmitting video 
packets over a packet-switched network, 

1) Improved and consistent image quality: If video signals 
are transmitted over fixed-rate circuits, there is a need to keep 
the coded bit rate constant, resulting in image degradation 
accompanying rapid motion. 

2) Multimedia integration: As mentioned above, integrated 
broadband services can be provided using unified protocols. 

3) Improved transmission efficiency: Using variable bit-rate 
coding and channel sharing among multiple video sources, 
scenes can be transmitted without distortion if other sources, 
at the same time, are without rapid motion. 

However video transmission over packet networks also has 
the following drawbacks. 

1) The time taken to transmit a packet of data may change 
from time to time. 

2) Packets may be delayed to the point where, because of 
constraints due to the human visual system, they have to be 
discarded. 

3) Headers of packets may be changed because of errors 
and delivered to the wrong receiver. 

It has to be emphasized that the delay/lost effect can 
reach very high levels if the combined users’ requirements 
exceeds the acquirable bandwidth and may seriously damage 
the quality of the image. 

When the signals transmitted in the network are nonstation- 
ary and circuit-switching is used with limited bandwidth, a 
buffer between the coder and the channel is needed to smooth 
out the varying rate. If the amount of data in the buffer exceeds 
a certain threshold, the encoder is instructed to switch into a 
coding mode that has lower rate but worse quality to avoid 
buffer overflow. In packet-switched networks, asynchronous 
time division multiplexing (ATDM) can efficiently absorb 
temporal variations of the bit-rate of individual sources by 
smoothing out the aggregate of several independent streams 
in the common network buffers [8]. 

To deliver packets in a limited time and provide a real time 
service is a difficult resource allocation and control problem, 
especially when the source generates a high and greatly 
varying rate. In packet-switched networks, packet losses are 
inevitable, but use of a packet-switched network yields a 
better utilization of channel capacity. However, it should be 
noted that the varying rate requirements of the video coder 
may not be synchronized with the variations in available 
channel capacity which changes depending on the traffic in 
the network. Therefore, the interactions between the coder and 
the network have to be considered and incorporated into the 
requirements for the coder. These requirements include the 
following. 

1) Adaptability of the coding scheme: The video source 
we are dealing with has a varying information rate. So it is 
expected that the encoder should generate different bit rates 
by removing the redundancy. When the video is still, there is 
no need to transmit anything. 


2) Insensitivity to error: The coding scheme has to be 
robust to the packet loss so that the quality of the image 
is never seriously damaged. Remember that retransmission is 
impossible because of the tight timing requirement. 

3) Resynchronization of the video: Because of the vary- 
ing packet-generating rate and the lack of a common clock 
between the coder and the decoder, we have to find a way 
to reconstruct the received data which is synchronous to the 
display terminal. 

4) Control coding rate: Sensing the heavy traffic in the 
network, the coding scheme is required to adjust the coding 
rate by itself. In the case of a congested network, the coder 
could be switched to another mode which generates fewer bits 
with a minimal degradation of image quality. 

5) Parallel architecture: The coder should preferably be 
implemented in parallel. That allows the coding procedure to 
be run at a lower rate in many parallel streams. 

In the next section, we investigate a coding scheme to see 
how well it satisfies the above requirements. 

Ill, Mixture Block Coding with Progressive 
Transmission 

Mixture block coding (MBC) is a variable-blocksize trans- 
form coding algorithm which codes the image with different 
blocksizes depending upon the complexity of that block area. 
Low-complexity areas are coded with a large blocksize trans- 
form coder while high-complexity regions are coded with 
small blocksize. The complexity of the specific block is 
determined by the distortion between the coded and original 
image when the same number of bits are used to code each 
block. A more complex image block has higher distortion. The 
advantage of using MBC is that it does not process different 
complex regions with the same blocksize. That means MBC 
has the ability to choose a finer or coarser coding scheme to 
deal with different complex parts of the same image. With the 
same rate, MBC is able to provide an image of higher quality 
than a coding scheme which codes different complex regions 
with the same blocksize coder. 

When using MBC, the image is divided into maximum 
blocksize blocks. After coding, the distortion between the 
reconstructed block and the original block is calculated. The 
block being processed is subdivided into smaller blocks if 
that distortion fails to meet the predetermined threshold. 
The coding-testing procedure continues until the distortion 
is small enough or the smallest blocksize is reached. In this 
scheme, every block is coded until the reconstructed image is 
satisfactory and then moves to the next block. 

Mixture block coding with progressive transmission 
(MBCPT) is a coding scheme which combines MBC and 
progressive coding. Progressive coding is an approach that 
allows an initial image to be transmitted at a lower bit 
rate which can later be updated [9). In this way, successive 
approximations converge to the target image with the first 
approximation carrying the “most” information and the 
following approximations enhancing it. The process is like 
focusing a lens, where the entire image is transformed from 
low-quality into high-quality. In progressive coding, every 
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goes down to 
pass 8x8 


Fig. 1. Structure of the first pass consisting of 16 x 16 blocks for MBCPT. 


pixel value, or the information contained in it, is possibly 
coded more than once and the total bit rate may increase 
due to different coding scheme and quality desired. Because 
only the gross features of an image are being coded and 
transmitted in the first pass, the processing time is greatly 
reduced for the first pass and a coarse version of the image 
can be displayed without significant delay. It has been shown 
that it is perceptually useful to get a crude image in a short 
time, rather than waiting a long time to get a clear complete 
image. 

With different stopping criterion, progressive coding is 
suitable for dynamic channel capacity allocation. If a predeter- 
mined distortion threshold is met, processing is stopped and 
no more refining action is needed. The threshold value can 
be adjusted according to the traffic condition in the channel. 
Successive approximations (or iterations) are sent through the 
channel in progressive coding and lead the receiver to the 
desired image. If these successive approximations are marked 
with decreasing piority, then a sudden decrease in channel 
capacity may only cause the received image to suffer from 
quality degradation rather than total loss of parts of the images. 

MBCPT is a multipass scheme in which each pass deals 
with different blocksizes. The first pass codes the image with 
maximum blocksize and transmits it immediately. Only those 
blocks which fail to meet the distortion threshold go down to 
the second pass which processes the difference image block 
(coming from the original and coded image obtained in the first 
pass) with smaller blocks. The difference image coding scheme 
continues until the final pass which deals with the minimum 
size block. At the receiving end, a crude image is obtained 
from the first pass in a short time and the data from following 
passes serve to enchance it. Fig. 1 shows the structure of a 
pass consisting of 16 x 16 blocks for MBCPT. Fig. 2 shows the 
parallel structure of MBCPT. Coding algorithms using quad 
trees have also been proposed by Dreizen [10] and Vaisey and 
Gersho [11]. In the quad tree coding structure of this paper, 
the 16 X 16 block is coded and the distortion of the block is 
calculated. If the distortion is greater than the predetermined 
threshold for 16x 16 blocks, the block is divided into four 8x8 
blocks for additional coding. This coding-checking procedure 
is continued until the only image blocks not meeting the 
threshold are those of size 2x2. Fig. 3 shows the algorithm. 

The blocksize used in the coding scheme should be small 



Fig. 2. Parallel structure for MBCPT. 



enough for ease of processing and storage requirements, but 
large enough to limit the inter-block redundancy [12]. Large 
blocksizes result in higher compression, but it is very difficult 
to build real-time hardware for blocksizes larger than 16 x 
16 because of the increase in the number of computations. 
So, 16 X 16 is chosen to be the largest blocksize. The 
minimum blocksize determines the finest visual qualtiy that 
is achievable in the busy area. If the minimum blocksize 
is too large, it is possible to observe the blockiness in the 
coded edge of spherical objects because the coding block is 
square. In order to match the zonal transform coding used 
in this paper, 2 x 2 is the smallest blocksize and there are 
four passes (16 x 16, 8 x 8, 4 x 4, 2 x 2) in this scheme. 
Figs. 4-7 shows images from the 4 passes. 

After applying the discrete cosine transform, only four 
coefficients, including the dc and three lowest order frequency 
coefficients, are coded and the others are set to zero. The dc 
coefficient in the first pass is coded with an 8-bit uniform 
quantizer due to the fact that it closely reflects the average 
gray level for that image block and is hard to model. The 
dc coefficient in the subsequent passes follows a Laplacian 
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Fig. 4. Image rcci>nsirucial trom tirsl pass. 



Fig. 5. Image rceon^triictcj from tu^'T passes. 



Fig. 6. Image reeonMnjcted from Hrst three passes. 


model, and a 5-bii opiimal Laplacian nonuniform quantizer 
is used to code it. The ac coefficients also follow a Laplacian 
model with a variance greater than that of the dc coetficient and 
can therefore also be coded using a Laplacian quantizer. As 
an alternative, an LBG \ ector quantizer with a 512 codebook 
size is used to quantize the vector which comprises the three, 
ac coefficient. The initial threshold of each pass is selected 
beforehand and is rcadjustablc during the operation according 
to the channel condition and quality required. 

Because onlv partial blocks which tail to meet the distortion 
threshold need to be coded, side information is needed to 
instruct the receiver on how to reconstruct the image. One 
bit of overhead is needed for each block. If a block is to 
be divided, a 1 is assigned to be its overhead; it not, a 0 
is assigned. The example shown in Fig. S has the following 
overhead: 1. KUU. HMil. ]nni. inoi. loni. 

The interframe ctidcr UNcd in this paper is a differential 
scheme which is based on MBCl^'F. Tliis coder processes 
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16 X 16 


overhead * 1,1001,1001.1001.1001.1001 
Fig. 8. Overhead assignment and zonal coding. 

the difference image coming from the current frame and the 
previous frame which is locally decoded from the first three 
pass data. Fig. 9 shows the algorithm of this coder. Fig. 10 
shows a different scheme which does the local decoding with 
all four passes. From Fig. 11, it can be seen that when there 
is no packet loss, the performances of these two schemes are 
quite the same. But when congestion occurs in the network, 
with the priorities assigned to packets, packets from pass 4 are 
expected to be discarded first. In this case, the performance 
(from Fig. 12) of the scheme in Fig. 9 is much better than 
the one in Fig. 10. Therefore the coding scheme in Fig. 9 is 
used in our simulation. In this paper, the Kronkite motion 
sequence from the USC database with 16 frames is used as 
the simulation source. Every image is 256 x 256 pixels with 
graylevels ranging from 0 to 255. It is similar to a video 
conferencing type image which has neither rapid motion nor 
scene changes. Due to this characteristic, advanced techniques 
like motion detection or motion compensation have not been 
used but could be implemented when broadcasting video. 

From the dalastream output that is listed in Table I, we can 
see that the data in pass 4 represents 30-40% of the entire data. 
This part of the data is involved in increasing the sharpness 
of the image and is usually labeled with the lowest priority in 
the network. We therefore call this the least significant pass 
(LSP). With a substantial possibility of being discarded due 
to low priority, those packets from pass 4 will not be used 
to reconstruct the locally decoded image and be stored in the 
frame memory. This prevents the packet loss error propagating 
into following frames if the lost packet belongs to pass 4. 
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Fig. 9. Differential MBCPT coding scheme (1). 
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Fig. 10. Differential MBCPT coding scheme (2). 


IV. Simulation Network 

The network simulator used for this study was a modified 
version of an existing simulator developed by Nelson et aL 
[13]. A brief description of the simulator is provided here. 



Fig. 11. Performance of two differential MBCPT schemes without packet 

loss. 



Fig. 12. Performance of the two MBCPT schemes with packet losses from 

pass 4. 

the existing capacity and reliability of system components. 
The scheme for communicating information regarding the 
operating status is called the system protocol. Since the 
communication of system information must flow through 
the channel, it reduces the overall capacity of the physi- 
cal layers, but hopefully provides a more efficient system 
overall. Therefore, system efficiency depends entirely upon 
these protocols, which, in turn, depend upon the system 
topology, communication channel properties, nodal memory 
and component reliability. Most network protocols have been 
developed to provide high reliability in topological structures 
with reasonably high channel reliability. 

In order to fit into the purpose of this study, most modifica- 
tions which were made to the simulator were in those modules 
concerning the network layer. Since the simulator is structured 
in modules which represent, to some degree, the ISO Model for 
packet switched networks, a more detailed description about 
the network layer modules follows. 


A. Introduction 

As mentioned in Section II, tomorrow’s integrated telecom- 
munication network is a very complicated and dynamic struc- 
ture. Its efficiency requires sophisticated monitoring and con- 
trol algorithms with communication between nodes reflecting 


B. The Network Layer and Basic Operation 

The simulation of a layer at each node is represented by 
a “processor” and one or more “packet queues.” All events 
are scheduled through the “Sim_Q” which drives the simu- 
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TABLE I 

Performance Degradation due to Packet Loss in Different Passes 


PSNR with packet losses only in 


Frame # 


Pass 4 


Pass 3 


Pass 2 


Pass 1 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


40.30 

40.59 
40.07 
39.70 
40.19 
39.65 
38.74 

38.59 
38.68 
38.51 
39.48 
39.26 
38.83 
38.54 
38.86 
39.47 


40.30 
40.37 
39.02 
38.19 
38.35 
38.05 
36.27 
35.58 
34.96 
34.33 

34.31 
34.01 
33.75 
33.09 
33.21 
33.24 


40.30 
40.12 
36.15 
35.82 

36.31 
35.21 
33.23 

31.52 
30.81 

29.85 

29.86 
29.67 
29.57 
29.46 

29.52 
29.37 


40.30 

37.55 

31.99 

31.70 

30.18 

28.35 

26.07 

24.61 

23.27 

21.77 

21.54 

21.90 

22.00 

22.30 

22.34 

22.33 


lator. Initially, the processors are all idle, the packet queues 
are all empty and the only tasks scheduled are the arrival 
of messages at the various nodes. The simulator operation 
occurs by examining the next event and performing the task 
indicated. The task may result in the scheduling of additional 
events, generally referred to as task completion times. When 
a message or packet is placed in the input queue at a node 
for a given layer, the processor for that queue is marked as 
busy, the packet is removed from the queue, and the task to be 
performed by the processor is scheduled for completion. When 
the task is completed (as a result of the simulator reaching 
that point in time), the “processor^’ examines the queue. If the 
queue is empty, the processor is set idle; otherwise it removes 
the next message or packet from the queue and schedules the 
completion of the operation which must be preformed. The 
layers in the simulator are quite close in operation to the ISO 
transport, network and datalink layers. 

1) The Session Layer: In the OS I model, the session layer 
(SL) allows users to establish “sessions” on local or remote 
systems. In the simulator, as mentioned above, it contains a 
relatively simple model of the subscribers, participates in flow- 
control, and acts as a statistics collector for messages arriving 
and delivered. At message arrival time (from Sim^Q), the 
session layer generates the “message” with all of its randomly 
selected attributes and if flow control or node hold-down are 
not in effect, submits it to the transport layer. It then schedules 
the next message arrival time. During initialization, the task 
“SL Rcv_Msg” for each node is queued in Sim_Q for the 
arrival time of the first message at that node. When this task 
is executed by the simulator, a message packet is generated 
and placed in the transport queue. The arrival of the next 
message is then queued in Sim_Q with the same task and with 
an arrival time determined by the random number generator 
(Poisson Distributed). The only other task performed by the 
session layer is the “SL^Snd^Msg” task that simulates delivery 
of mesages to the subscribers, develops message statistics and 
“cleans up” the queues for messages delivered. 

2) The Transport Layer: The basic function of the transport 


layer at the sending end is to receive the message from the 
session layer, place it in packets and pass the packets on 
to the network layer. At the receiving end, the packets are 
reassembled into a message for delivery to the session layer. 

To accomplish the complex task of assuring reliable delivery, 
there is a transport time-out mechanism at both the sending and 
receiving nodes and a message acknowledgement packet that is 
sent to the sending node when all packets for the message have 
been satisfactorily received. At the sending end, if a message 
acknowledgment is not received in the allotted time period, 
the message can be retransmitted. In the simulations reported 
in this paper, the retransmission feature was not used. At the 
receiving end, if all packets are not received in the specified 
period of time, the entire message is discarded. It is recognized 
that in some networks, packetization takes place at the network 
level, leaving the transport layer responsible only for message- 
level structures. Reassembly, depending upon the protocol, 
can take place as low as the datalink level. These tasks were 
both placed in the transport layer, but are modular, and could 
be extracted and placed elsewhere. Also, the simulator was 
originally designed for datagram service, and since the packets 
do not necessarily arrive in order, it is unlikely that assembly 
would take place at the datalink level. 

3) The Network Layer: The netw-ork layer is concerned with 
controlling the operation of the network. A key design issue is 
determining how packets are routed from source to destination. 
Another issue is how to avoid the congestion casued when too 
many packets are presented to the network at the same time. In 
the simulator, the network layer performs all of the functions 
related to these two aspects with the exception of that aspect 
of flow control which takes place at the session layer, and 
the recovery protocols which require some service from the 
datalink layer. It also activates new channels when needed 
and determines when packets originating at other nodes are to 
be discarded. The network layer is currently the most dynamic 
with regard to the coding of modules. Five modules currently 
comprise the network layer. These include relatively static 
modules; one module for capturing lines or channels when 
more capacity is required and releasing them when they are 
not needed; one module for the network processor and queue 
handling and one module for the routines which are common 
to most routing algorithms. This leaves two modules for the 
dynamic parts of the routing and flow control algorithms. 

4) The Datalink Layer: The main task of the datalink layer 
is to take the raw transmission faciity and transform it into 
a line or channel that appears free of transmission errors to 
the network layer. It simulates the sending of the message 
over the channel and the delivery at the other end. When a 
packet is received, the datalink acknowledgment is initiated 
cither by the piggy-back acknowledgment or by generating a 
datalink ackowledgment packet. As mentioned previously, the 
datalink level also simulates the physical layer on a statistical 
basis. (Entered bit eror rates are used in conjunction with 
a random number generator to determine if messages are 
corrupted.) When a line is “brought up,” health packets are 
used to establish initial connections. Also, when a line “goes 
down,” an active node will immediately issue health check 
packets to ascertain when the channel is again available. 
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C. Modifications 

A major problem of using this system as a simulation tool 
for the study of packet video is that as initially designed 
the system did not actually transmit messages from node to 
node. While a “packet"’ carrying all the necessary describing 
information moved from node to node, there was no actual 
data in the packet. Therefore, modifications had to be made to 
the simulator to accommodate the video data. In the sending 
node, a field called “Image” which contains real image data is 
attached to the record “Packet_Ptr” allocated to the message 
generated in the session layer. There are three new modules 
in this layer. First, “Getjmage” puts the image data into the 
image field of a message generated at a specific time and 
node. Second, “Image_Available” checks to see if there is 
any image data that still need to be transmitted. If that is 
true, the following message, generated at that specific node, is 
Still the image message and contains some image data. Third, 
“Receive_Image” collects the image data in the session layer 
of the receiving node when the flag “Image_Complete” is on. 

In module “Session_Msg_Arrive,” different priorities are as- 
signed to different messages. In module “Session_Msg_Send,” 
some statistics are calculated including the number of lost 
image packets and the transmission delay for image packets. 

In the original design, the transport layer simply duplicated 
the same packet with different assigned sequential packet num- 
bers without actually packetizing the message. The module 
“Transport_Packetize” has been modified to really packe- 
tize the image data which resides in the message record 
queued in “Transport_Q” when it is called. The module 
“Transport_Reassemble” is called to reassemble these linage 
packets according to their packet number when the flag Im 
age Content” defined in “Packet _Ptr” is true. The network 
layer is responsible for routing and flow-control. This module 
was already very well developed, so the modifications to be 
performed here were relatively minor. In the datahnk layer, m 
order to simulate the delivery of packets through the channel, 
a new packet is generated at the receiving node and the 
information including the image data from the transmitted 
packet (which will still be resident at the sending node) are 
copied into it. Using existing bit-error-rates, the transmi^ion 
success rate can be set and bit errors can be inserted in both the 
data and control bits in the packet. Errors in the control bits are 
simulated separately as long as the error rates are consistent. 
If an error in the control bits occurs, the transmission is 
assumed to fail and retransmission will occur, again depending 
on the threshold of the timeout number. In addition to the 
modifications made to the layer modules, we had to arrange 
some new memory elements allocated for image messages an 
packets. In order to make sure the simulation is run in the 
steady state, the image data is made available to the network 
after some simulaton time has passed. 


V. Interaction of the Coder and the Network 

When the video data is packed and sent into a nonideal 
network, some problems emerge. These are discussed in the 
following section. 


>. ■ 


A. Packetization 

The task of the packetizer is to assemble video information, 
coding mode information, if it exists, and synchronization 
information into transmission cells. In order to prevent the 
propagation of the error resulting from the packet loss, packets 
are made independent of each other and no data from the same 
block or same frame is separated into different packets. The 
segmentation process in the transport layer has no information 
regarding the video format. To avoid the bit stream being 
cut randomly, the packetization process has to be integrated 
with the encoder, which is in the presentation layer of the 
users’s premise. Otherwise, some overhead has to be added 
into the datastream to guide the transport layer to perform the 
packetization in the desired manner. In order to limit the delay 
of packetization, it is necessary to stuff the last cell of a packet 
video with dummy bits if the cell is not completely full. 

Every packet must contain an absolute address which indi- 
cates the location of the first block it carries. Because every 
block in MBCPT has the same number of bits in each pass, 
there is no need to indicate the relative address of the following 
blocks contained in the same packet. There always exists a 
tradeoff between packaging efficiency and error resilience. If 
error resilience is considerable, one packet should contain a 
smaller number of blocks. However, since each channel access 
by a station contains overhead, the packet length should be 
large for transmission efficiency. Fixed length packetization is 
used in this paper for simplicity. 

Because of the structure of the coding scheme, the packets 
are classified into four priorities, with the packets from the first 
pass classified as the highest priority packets, and the packets 
from the fourth pass as the lowest priority packets. 

This priority assignment also reflects the importance of the 
various packets to the reconstruction of the image sequence 
at the receiver. Table I shows the effect of approximately the 
same number of packets lost in each pass on the reconstructed 
error in the received sequence. 


B. Error Recovery 

There is no way to guarantee that packets will not get 
lost after being sent into the network. Packet loss can be 
mainly attributed to two problems. First, bit errors can occur 
in the address field, leading the packets astray in the network. 
Second, congestion can exceed the networks management 
ability and packets are forced to be discarded due to buffer 
overflow. Effects created by higher pass packet loss (like 
pass 4) in MBCPT coding will be masked by the basic passes 
and replaced with zeros. The distortion is almost invisible 
when viewing at video rates because the lost area is scattered 
spatially and over time. However, loss of low pass packets 
(like pass 1), though rare due to high priority, will create 
an erasure effect due to packetization and the effect is very 

objectionable. ... 

Considering the tight time constraint, retransmission is not 
feasible in packet video. It may also result in more severe 
congestion. Thus, error recovery has to be performed by the 
decoder alone. In our differential MBCPT scheme, the packeU 
from pass 4 are labeled lowest priority and form a great part 
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TABLE n 

Number of Bits Transmitted for Each Pass and the Total 
Number of Bits Transmitted for Each Frame 


Frame 


Overhead 


Passl 


Pass2 


Pass3 


Pass4 


Total 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


2588 

1772 

2156 

2088 

2164 

1988 

2352 

2432 

2316 

2568 

1892 

2352 

1968 

2468 

2216 


4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 

4352 


8400 

5992 

7168 

6888 

7112 

6328 

7448 

7952 

7504 

7840 

6048 

7616 

6384 

7840 

9352 


24248 

15232 

19432 

18760 

19600 

17920 

21896 

22512 

21336 

24528 

16856 

21728 

17584 

23128 

18088 


24416 

11312 

20104 

13216 

17416 

14336 

22736 

25704 

24136 

26992 

11144 

18200 

15008 

26936 

728 

12936 


64004 

38660 

53212 

45304 

50644 

44924 

58784 

62952 

59644 

66360 

40292 

54248 

46296 

64734 

34736 

36164 


16 

1496 



*■ 



Total 

34816 

69632 

114408 

315672 

287392 

820992 

Mean 

2176 

4352 

7150 

19729 

17962 

51312 

Deviation 

290 

0 

1094 

3179 

7000 

10395 


of the total data. These packets can be discarded whenever 
network congestion occurs. That will reduce the network 
congestion and will not cause too much degradation in quality. 
The erasures caused by basic pass loss are simply covered 
with the reconstructed values from the corresponding area in 
the previous frame. This remedy seems insufficient even when 
there is only a small amount of motion in that area. Motion 
detection and motion compensation could be used to find a 
best matched area for replacement in the previous frame. 

Side information in the MBCPT decoding scheme is very 
important. So, this vital information is not allowed to get 
lost. Two methods can be used for protection. First, error 
control coding, like block codes or convolutional codes, can 
be applied in both directions along with and perpendicular 
to the packetization. The former is for bit error in the data 
field while the latter is for packet loss. The minimum distance 
that the error control coding should provide depends on the 
network's probability of packet loss, correlation of such loss 
and channel bit enor rate. Second, from Table II, we can see 
that the output rate of side information and pass 1 and even 
pass 2 is quite steady. It seems feasible to reserve a certain 
amount of channel capacity to these outputs to ensure their 
timely arrival. That means circuit-switching can be used for 
important and steady data. 

C. Flow Control 

In order to shield the viewer from severe network conges- 
tion, there are some flow control schemes which are considered 
useful. It there is an interaction between the encoder and 
the transport layer, then the encoder can be informed about 
the network condition. Depending on that, the encoder can 
adjust its coding scheme. In the MBCPT coding scheme, if the 
buffer is getting full, that means that the bit generating rate 
is overwhelming the packetization rate and the encoder will 
switch to a coarse quantizer with fewer steps or loosen the 


threshold to decrease its output rate. In this way, smooth qual- 
ity degradation is obtainable. However, this also complicates 
the encoder design. 

It is possible to use the congestion control of the network 
protocols to prevent drastic quality change by assigning dif- 
ferent priorities to packets from different passes. Ignoring the 
relative importance of each packet and discarding packets 
blindly sometimes brings disaster and can cause a session shut 
down. For example, if the side information gets lost it can 
have a severe impact on the decoding process. In the MBCPT 
coding scheme, side information and packets from pass 1 are 
assigned highest priority and higher pass packets are assigned 
with decreasing priority. 

D. Interaction with Protocols 

In the ISO model, physical, datalink and network layers 
comprise the lower layers which form a network node. The 
higher layers have transport, session, presentation and appli- 
cation layers and typically reside in a customer s premises. The 
lower layers have to do nothing about the signal processing 
and only work as a “packet pipe.” The physical layer requires 
adequate capacity and low bit error rate which are determined 
only by technology. The datalink layer can only deal with 
link-management because all the mechanics, like requesting 
retransmission, are not feasible in packet video transmission. 
The network layer has to maintain orderly transmission by 
deleting the delay jitter with input buffering. Otherwise, it can 
take care of the network congestion by assigning transmission 
priority. 

As the higher layers reside in the customer s premises, it 
performs all the functions of the packet video coder. The 
transport layer does the packetization and reassembly. The 
packet length can be fixed or variable. Fixed packet length 
simplifies segmentation and packet handling while a variable 
packet length can keep the packetization delay constant. The 
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Fig 13 PSNR versus video output rate (video transmission at 30 frames/s). 

session layer supervises set-up and tear-down for sessions ^ 
which have different types and quality. There is always a trade- 
off between quality and cost. The quality of a set-up session 
can be determined by the threshold in the coding scheme and 
the priority assignment for transmission. Of course, the better . 
the quality, the higher the cost. Fig. 13 shows the tradeoff 
between PSNR and video output rate by adjusting thresholds. 
The presentation layer does most of the signal processing, 
including separation and compression. Because it knows the 
video format exactly, if any error concealment is required, 
it will be performed here. The application layer works as a 
boundary between the user and the network and deals with all 
the analog-digital signal conversion. 

VI. Performance Results 

Results obtained in this packet video simulation show that 
substantial compression can be obtained while maintaining 
high image quality through the use of this differential MBCPT 
scheme. The monochrome sequence used in this simulation 
contains 16 frames, each of size 256 x 256 pixels with 8 bits 
per pixel, which results in a bit rate of 15.3 Mbits/s, given 
a video rate of 30 frames/s. As Table II shows, the average 
data rates of our system is 1.539 Mbits/s. The compression 
rate is about 10 with a mean PSNR of 38.74 dB where PSNR 
is defined as 

E (255)^ 

Fig. 14 shows the data rate of sequence frames with side 
information, 4 passes and total rate. It is clear that the data 
rate of pass 1 is constant as long as the quantization mode 
remains the same. Side information and data from pass 2, even 
pass 3, is also relatively constant (Table III). The data rate of 
pass 4 is bursty and are highly uncorrelated. As pass 4 data 
is not essential to the reconstruction of the image, the rale 
profiles as shown in Fig. 14 and Table I suggest the use of 
a reserved channel of some sort for passes 1-3 and the side 
information, and perhaps a more unreliable channel for pass 4 
data which comprises more than 30% of the total traffic. Such 
a situation can be accommodated in a variety of systems such 
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Fig. 14. Dala rale of simulation sequence fames. 

TABLE 111 

Output bit rate for each Pass and the Tot^ 

Calculated with 30 Fra.mes/s Video Rate. The Maximum a.sd Minimum 
Values are the Insta.vtaneous Rates, Which Correspond to the 
Respective Ma-ximum a.nd Minimum Number of Bits Needed to 
Encode a Particuixr Frame in the Sequence. The Unit is Kilobits. 


Overhead Pass 1 


Mean 65.28 
Deviation 8.70 
Maximum 77.04 
Minimum 44.88 


538-86 1539.36 

210.00 311.85 

821.52 1990.80 

21.84 1042.08 


as a token ring network or a circuit switched network with a 
packet-switched overlay. 

Fig. 15 shows the PSNR for each frame in the sequence. 
Notice that the standard deviation of the PSNR is only 0.2 dB, 
which implies a substantial uniformity of quality, at least 
in terms of objective performance measures. If constancy 
with regard to some subjective criterion is desired, it would 
be necessary to incorporate this in the determination of the 
thresholds and the decision mechanism for the quad tree. In 
the simulation, the same threshold has been used throughout 
the sequence. If further flexibility, say for higher visual quality, 
is desired, a varying threshold can be used for different frames. 
That may generate a more variable bit rate. 

From the difference images of this sequence, frames 1-8 
seem quite motionless while frames 9-13 contain substantial 
motion. We adjusted the traffic condition of the network to 
force some of the packets to get lost and thus check the 
robustness of the coding scheme. Heavy traffic was set up 
in the motionless and motion period separately. The average 
packet loss percentage was 3.3%, which is considered high 
for most networks. Fig. 16 shows images which suffered 
packet losses from pass 4. As can be seen, the effect of lost 
packets is not at all severe, even if the lost packet rate is 
unrealistically high. This is because of the performance from 
the first three passes is relatively good and the packet from 
the fourth pass is not essential for reconstruction. Fig. 17 
shows the case when packet loss occurs in pass 1. Clearly 
there are visible defects in the motion period. Further, the 
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Fii! 15. PSNR of simulation sequence frames. 



(a) 



(^) 

Fig. 16. (a) The effect of pass 4 packet loss for frame 4. (b) The effect of 

pass 4 packet loss for frame lU. 


error will propagate to the following frames. Apparently, the 
replenishing scheme used here is not sufficient in areas with 
motion. It is believed that this inconsistency can be eliminated 
with a motion compensator algorithm which would find the 
appropriate area for replenishment and error concealment 
which limits the propagation of error. 

Vll. Conclusion 

The network simulator was used only as a channel in this 
simulation. In fact, before the real-time processor is built, a 
lot of statistics can he collected from the network simulator 
to improve upon the coding scheme. These include transmis- 



(a) 



(b) 


Fig. 17. (a) The effect of pass 1 packet loss for frame 3. (b) The effect of 

pass 1 packet loss for frame 9. 


sion delays and losses from various passes under different 
network loads. For resynchronization, the delay jitter between 
received packets can also be estimated from the simulation. 
The environment for tomorrow's telecommunication has been 
described and requires a flexibility which is not possible 
in a circuit-switched network. With ail the requirements for 
applying packet video in mind, MBCPT has been investigated. 

It is found that MBCPT has appealing properties, like high 
compression rale with good visual performance, robustness 
to packet lost, tractable integration with network mechanics 
and simplicity in parallel implementation. Some additional 
considerations have been proposed for the entire packet video 
system, like designing protocols, packetization, error recovery 
and resynchronizalion. For fast moving scenes, the differential 
MBCPT scheme seems insufficient. Motion compensation, 
error concealment of even attaching function commands into 
the coding scheme are believed to be useful tools to improve 
the performance and will be the direction of future research. 
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Abstract 

We study the performance of a DPCM system with a recursively indexed quantizer (RIQ) under 
various conditions, with first order Gauss-Markov and Laplace-Markov sources as inputs. We show that 
when the predictor is matched to the input, the proposed system performs at or close to the optimum 
entropy constrained DPCM system. We also show that the if we are willing to accept a 5% increase in 
the rate, the system is very forgiving of predictor mismatch. 


1 Introduction 

Differential pulse code modulation (DPCM) is often used to efficiently convert an analog source such as 
speech, music or images into a digital form for communication or storage. Its efficiency is due to the 
exploitation of the memory in a source by the use of a predictor, which estimates the present source sample 
to be encoded, based on the quantized previous source samples. The performance of DPCM depends on two 
factors 

1. How well the predictor exploits the source memory, i.e. , how closely it can estimate the actual source 
samples. 

2. How well the quantizer is matched to the prediction error (the quantizer input). 

In order to maximize the goodness of prediction, the predictor is usually chosen based on the statistical 
properties of a given source. However, many physical sources, such as those listed above, exhibit statistically 
varying local properties which are usually quite distinct from their global ones. If this is the case and DPCM 
happens to operate on a segment whose statistics differ from the global ones, it operates in a mismatched 
state, which results in additional degradation in the reproduction [1]. 

Various schemes have been devised to handle this mismatch between the source and the predictor. These 
involve some form of adaptation of the quantizer and/or the predictor, by which DPCM quickly follows up 
the changing statistics of the source and prevents overloading of the system. However, this quick response 
of DPCM is not without its cost: the adaptation requires more implementation and operational complexity. 

Matching the quantizer to the statistics of the prediction error is even more difficult, as the quantizer 
structure itself effects the statistical properties of the prediction error. One way to obtain the statistics of 
the prediction error process is through the use of an orthonormal expansion [2, 3, 4, 5]. This has been used to 
optimize the quantizer through an iterative procedure [4, 6, 7, 8]. However, under operational circumstances 
this might not be a viable option. 

•This work was supported by NASA Lewis esearch Center under grant NAG 3-806, and The Goddard Space Flight Center 
under grant NAG 5-1615 

t Department of Electricad Enpneering, University of Nebraska - Lincoln, Lincoln, NE 68588-0511 
1 Department of Electronic Engineering, Ajou University, Suwon, Korea 
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In this paper we study the performance of a DPCM system operating on a Gauss-Markov source and a 
Laplace-Markov source. The DPCM system considered here, called a recursively indexed DPCM, consists 
of a uniform quantizer with infinitely many output levels, a recursively indexed binary encoder, and a 
first-order linear predictor. The quantizer is designed simply by specifying its step-size, and the predictor 
is non-adaptive. The Gauss-Markov and Laplace-Markov sources are chosen because of their use in the 
modeling of physical sources. The goal is to observe the rate distortion performance of this system, as 
well as the performance degradation when the predictor and the source are mismatched. We compare the 
rate distortion performance of the proposed system to the optimum results in the literature [6] , where the 
quantizer was optimized using the iterative procedure mentioned above. 

The simulation results show that the rate distortion performance of the proposed system achieves or comes 
very close to the optimum performance at all rates studied. In the case of mismatch the simulation results 
show that only a 5 percent increase in the rate allows a rather wide range of the predictor mismatch to 
first-order Gauss-Markov and Laplace-Markov sources. They agree with a result in [9] reported for a 2-level 
optimized DPCM that the predictor coefficient does not significantly effect the optimality of DPCM. But 
unlike in [9] it is observed that a lower predictor coefficient than the source correlation coefficient is better 
for the low rate case. 

In Section 2 a recursively indexed binary encoder is discussed. In Section 3 DPCM is briefly reviewed and 
the DPCM mismatch problem is posed. In Section 4 the performance of a recursively indexed DPCM is 
considered and numerical results are presented. Conclusions follow in section 5. 


2 A Recursively Indexed Binary Encoder 

The DPCM system discussed here uses a quantizer with infinitely many output levels. This requires binary 
encoding of a countably infinite alphabet, which poses obvious problems in design and operation. An obvious 
and reasonable approach is first to represent the input alphabet using only finite many symbols and then 
to encode these symbols either using a fixed-to-fixed or fixed-to-variable length encoding. A recursively 
indexed binary encoder is used for just this purpose. 

The recursively indexed binary encoder considered in this paper is a two stage binary encoder: recursive 
indexing followed by an optimum (the minimum average codeword length) symbol- to- variable length binary 
encoder for the output of recursive indexing. 

Recursive indexing is a mapping of a countable set to a collection of sequences of symbols from another set 
of finite size [10, 11]. Given a countable set A = {ao,ai,.. .} and a finite set B = {6q, 6i, . . . , 6 a/_i} of size 
A/, the recursive indexing of A by 5 is a mapping / of A to the collection of all sequences of symbols from 
B such that 


I{ai) = 6r if i = q(M - 1) -h r (1) 

^ ^ 

q times 

where q and r are the quotient and remainder of i when divided by M — 1. Set B is called the representation 
set. Defined as such, recursive indexing is a one-to-one mapping, a symbol-to-variable length, M-ary, prefix- 
free code and therefore uniquely and instantaneously decodable. 

Since the second stage of the recursively indexed binary encoder is an optimum symbol-to-variable length 
encoder (the Huffman algorithm is used to design such), the statistic of the representation symbols must 
be computed. For this purpose we first compute the number of representation symbols needed to de- 
scribe a typical source sequence Xi,X 2 ,...Xn of the length n from set A. Define pk = Pr{X — at), 
then the number no of the occurrences of symbol to is computed as follows. Observe that to occurs once 
whenever ao, OAf-i, a 2 M- 2 , • • • , . . . occur. The number of times these symbols occur is given by 

npo^ npM-u «P 2 M -27 • • • i ^Pfc(M-i)> • • • so on. Therefore, 


2 


( 2 ) 


CX> 

no = «y~lpt(A/-i)- 

k=0 


In a similar memner the number nj of the occurrences of symbol bj are found to be: 


00 

rij = nJ2PHM-i)+j for i = 0, 1, . . M - 2. 

ib=:0 

oo M — 1 

nw-i = ^PKM-i)+j- 

k=0 j=o 


(3) 


From these it is seen that on the average the number of representation symbols needed for n source symbols 
is 


Af-l 

j=0 

Therefore, the average number of representation symbols to represent one source symbol is 


(4) 


Af-l 


oo Af — 1 


^ ^ n_, = 1 + 1 + E E l‘Pk{M-i)+j 


i=0 


jb=0 > = 0 


(5) 


It is convenient to define the above expression as the expansion factor, denoted e, of the recursive indexing 
/. It is the factor by which one source symbol is expanded by the recursive indexing. The relative frequency 
Qj of representation symbol bj , is computed as follows: 


An optimum symbol- to- variable length binary encoder after the recursive indexing takes one representation 
symbol at a time and produces the corresponding binary sequence from a set of variable length code words. 
It is designed for example using the Huffman algorithm. Then its rate Rrj , the number of binary digits 
per representation symbol, is bounded by 


H{B) < Rri < H{B) + 1 (7) 

symbol-to- variable length code by the Huffman algorithm it is observed that rate R is almost equal to H{B), 
the lower bound, when M is large. 

The overall rate of the recursively indexed binary encoder then is bounded 


eH{B) <R<eH{B)^e 
Note that e approximately equals 1 if M is lau^ge. 


( 8 ) 


3 DPCM with Recursively Indexed Binary Encoding 

3.1 Source and DPCM 

Let us consider the encoding by a DPCM system of a first-order Gauss-Markov process, 


3 



Xk^pXk-x-\-Wk (9) 

where Wk is an independent identically distributed Gaussian with mean zero and variance cr^, . The source 
correlation coefficient p is between —1 and 1. 

The DPCM system consists of a quantizer, a predictor and a binary encoder In a typical operational cycle 
the difference Zk between the source output Xk and its prediction Xk is quantized by quantizer Q yielding 
Q(Zjt), which in turn is binary encoded. The predictor considered in this paper is a first-order linear 
predictor and therefore it is given by bYk-i for some constant 6. 

We will say that DPCM is matched to the source if the source correlation p equals the predictor coefficient 
b and that it is mismatched otherwise. 

The binary encoder in ordinary DPCM is either a fixed-to-fixed length or a fixed- to- variable length binary 
encoder. In the former the binary encoder takes the index of the quantizer output level, produces its binary 
representation and sends the binary sequence through the channel. In the latter blocks of quantizer outputs 
are buffered and (usually) entropy-encoded. 

The performance of a DPCM system will be measured by distortion and rate. The distortion incurred is 
defined to be 


1 

D = \imsnp-^^TEUXt-Ytf}. ( 10 ) 

It is well-known that the error of the DPCM system is that incurred by the quantizer alone and nowhere 
else, i.e., Xk —Yk = Zk - Q{Zk) Hence the distortion can be rewritten as 

D = limsup-- ^E\{Zk - Q{Zk))^Y (11) 

The rate is defined to be the average number of binary digits used to transmit one source symbol. In case 
of a fixed-to-fixed length binary encoder is given by flog 2 , where N is the number or quantizer output 
levels. In ceise of a fixed- to- variable length entropy encoder it is approximately H(Q), the entropy of the 
quantizer output process. 

3.2 Recursively Indexed DPCM 

The DPCM system considered in this paper is a recursively indexed DPCM system. It is different from an 
ordinary system in the following two ways: 

1. The quantizer Q is an infinite level uniform quantizer with the thresholds being the mid-points of 
output levels. 

2. The binary encoder is a recursively indexed fixed- to- variable length encoder. 

The quantizer with infinitely-mauiy output levels uniformly spaced yields granular distortion only. The 
magnitude of the distortion is bounded by A/2. Therefore, no matter how large the input to the quantizer 
is, due to bad prediction, its output is at most A/2 different from its input. Since an unoverloaded quantized 
value is available to the predictor at the next prediction, the system can track the source output, thereby 
yielding lower prediction errors. Due to this quick response the system does not have catastrophic error 
propagation, which DPCM with a finite number of quantizer output levels has when a pathological source 
sequence is encountered. 
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As discussed in Section 2, a recursively indexed binary encoder is necessary because the quantizer used 
has infinitely many output levels. We note that the encoder is not necessarily an entropy-encoder for the 
quantized process. 

The distortion for this system is given by (11), while the rate is simply the entropy of the representation 
symbols multiplied by the expansion factor. 


4 Simulation Results 

To test this system first order Gauss-Markov and Laplace-Markov random number generators with cor- 
relation coefficient p were used. The Laplace-Markov process was defined as in [6]. The variance of the 
innovation sequence was chosen so as to get a source variance of unity. For each realization of the process 
100,000 samples were used. 

4.1 Rate-Distortion Performance 

To obtain the rate-distortion performance, the predictor was matched to the source correlation coefficient 
and the step-size was varied. Each step-size A generated a distortion-rate pair which was then plotted. The 
results were overlaid on the optimum results from [6]. The results for p = 0.8 are plotted in Figures 1 and 
2 . 

For the Gauss-Markov source the rate distortion performance of the proposed system achieves (or almost 
achieves) the optimum performance for the entire range of rates. This is true for both the experimentally 
obtained rates from [6] as well as the asymptotic results. Recall that for the optimum results the quantizer 
was designed using a relatively complex iterative procedure, while for the proposed system the quantizer 
was designed by simply specifying the step-size. 

For the Laplace-Markov source, the proposed scheme again performs as well as the optimum scheme for 
almost all rates as faj as the experimentally obtained results are concerned. However, the proposed scheme 
provides 6effer results than the asymptotic results in [6]. We presume this is due to an error in the asymptotic 
results (or our interpretation of them!). 

Similar results were obtained for 0.5, and 0.2 for both sources. The results seem to indicate that the 
problem of matching the quantizer with the input statistics can be easily resolved by the use of a recursively 
indexed quantizer. The quantizer can be easily designed by simply specifying the step-size A, which in turn 
can be specified based on the distortion requirements. 

4.2 Performance Under Mismatch Conditions 

To investigate the effect of mismatch between the predictor and model coefficients we used three values for 
the spacing A of the uniform quantizer: 2.5 <Tx, 0.2<Tx. These correspond to low, medium, and high 

resolution (rate) quantizations, respectively. For each spacing for uniform- quantization, the source sequence 
is applied to the recursively indexed DPCM system with various values for the predictor coefficient. The 
quantizer output sequence is fed to the recursively indexed binary encoder with M representation symbols. 
The value of M ranges from 5 to 31. The results are shown in Figure 3, where the horizontal and vertical 
axis are respectively the predictor coefficient and the product of the rate and distortion. 

For the Gauss-Markov source, for low rate (A = 2,5<rj;) the distortion is about 0.44. The best performance 
is obtained when the predictor coefficient is around 0.6, where the rate is 0.58 bits/sample. This value is far 
below the source correlation 0.8. Note that in [4], the best performance was reported around 0.815, slightly 
higher than the source correlation. If 5% increase in rate is allowed, then Figure 3 shows that the predictor 
coefficient can be anywhere between 0.4 and 0.8. 
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For ™di.m rate (A = 1.5tr.) the distortion is abont 0,186, while AV12 is 0.1875 The best predictor 
ciffi^ent is „«ond 0,7, The rate tor this value of A «rd J is 1.12 bits/sample. The value of h ,s agam lower 
:r.„ the source correlation. Again a 5% increase in rate allows f J-f ^l' 

0.48 and 0.88. For high rate (A = 0.2<r.) the distortion is about 0,00333 while A /12 is 0_003l». The b^t 
predictor coefficient is around 0.8, where the rate is 4.2 bits/sample, and can range from 0.0 to 0.99 if 5% 
increase in rate is allowed. 

We observe that the best predictor coefficient moves from below closer to the source correlation as resolution 
increases. This may be because for larger values of A the quantization noise tends to be magnified when the 
predator coefficient is larger^ Also we note that the distortion expression (AV12) is quite accurate even 
for latfge A. Similar results are observed for the Laplace-Markov source. 


4.3 Effect of the Alphabet Size M 

Finally we look at the size of the representation alphabet on the rate. Note that the larger the representadon 
alphabet is for a certain value of A the less likely it is that the encoder will enter the recursive mode. This 
implies that given a value of A larger values of M will tend to lower the value of the expansion factor e 
closer to 1, thus lowering the rate. If this is a very strong effect, recursive indexing looses some of jts ^harm 
as the smaller alphabet size of the reproduction alphabet makes it more anrienable to entropy coding. The 
recursively indexed DPCM system was simulated with representation alphabet sizes 7,9,11,13, ^nd 15. i 
results for a Laplace-Markov source are shown in Figure 4. Note that at low rates (large values of A)there is 
no difference between these sizes. At higher rates, there is noticeable difference ^ we increase he jphabe 
size from 7 to 11. After that point there is very little improvement obtained in the rate when the alpha 

size is increased. 


5 Conclusions 

Recursively indexed DPCM, which features an infinite level quantizer coded with a finite alphabet entropy 
coder has been shown to be an efficient encoder for first order Gauss-Markov and Laplace-Markov sources. 
The use of a recursively indexed quantizer in a standard DPCM system seems to provide a solution both to 
the problem of matching the quantizer to the prediction error statistics, and the problem of exactly matching 
the predictor to the source, at least for these simple sources. 
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1 Introduction 

The use of local area networks makes it possible to more easily implement al- 
gorithms that require the use of a ’’side channel”. In this paper we present an 
ADPCM (Adaptive Differential Pulse Code Modulation) based codec which 
can be conveniently implemented on LANs. 

Adaptive Differential Pulse Code Modulation (ADPCM) is a very popular 
compression technique because it is easy to implement, has low processing 
overhead, and relatively good fidelity. This has made it the algorithm of 
choice in speech compression applications, and as a second stage for subband 
coding and transform coding techniques. However, ADPCM image compres- 
sion is far from ideal. The most obvious drawback is poor edge reconstruction. 
ADPCM cannot track sudden changes in image statistics, and this can cause 
substantial edge distortion in the reconstructed image. A modified ADPCM 
scheme was presented in [1] which relied on the use of side information to 
prevent edge degradation. The technique is well suited for implementation 
on token ring networks. 

In this paper we describe the implementation of this scheme in a token 
ring network environment. The paper is organized as follows. The next 
section gives a brief overview of the aspects of token ring networks that are 
of interest here. The modified ADPCM scheme is briefly described in the 
following section. Then we describe the implementation of the proposed 
algorithm on a token ring network and present simulation results. 
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2 Token Ring Networks 

In a token ring network, nodes are arranged logically in a ring with each node 
transmitting to the next node around the ring. Each node simply relays the 
received bit stream from the previous node to the next node with at least 
one bit delay. The token is defined as a special bit pattern which circulates 
on the ring whenever all the stations are idle. Whatever node has the token 
is allowed to transmit a packet. When the packet has been transmitted the 
token is passed on to the next node. That is, whenever the node that is 
currently transmitting a packet finishes the transmission, it places the token, 
for example 01111110, at the end of the packet. When the next node reads 
this token, it simply passes the token if it has no packets to send. If it does 
have a packet to send it inverts the last token bit, in our example turning the 
token to 01111111. The station or node then breaks the interface connection 

and enters its own data onto the ring. 

The token ring supports two classes of traffic; 

1. Synchronous Traffic: A class of data transmission service whereby each 
requester is pre-allocated a maximum bandwidth and guaranteed a 
response time not to exceed a specific delay. 

2. Asynchronous Traffic: A class of data transmission service whereby all 
requests for service contend for a pool of dynamically allocated ring 
bandwidth and response time. 
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A set of timers and several parameters are used to limit the length of time 
a station may transmit messages before passing the token to the next station, 
and the duration of information transmission of each class within a station 
[2]. Each station maintains two timers, the Token_Rotation_Timer (TRT) 
and the Token.Holding-Timer (THT). The TRT at node j is used to time 
the interval taken by the token to circulate around the ring starting at node 
j. When node j recaptures the token, the value of TRT is assigned to THT 
and TRT is reset. When the network is initialized, the stations decide on 
the value of a target token rotation time (TTRT), so that the requirements 
for maximum access time are met. The upper bounds on the maximum and 
average token rotation time have been studied in [3]; the results show that 
the token rotation time cannot exceed twice the value of TTRT, while the 
average rotation time is not greater than TTRT. The extension to several 
priority classes is obtained by introducing a target rotation time for each 
class, and by using that value to check whether or not the station is allowed 
to transmit frames of that class. 

If a station captures the token before its TRT reaches the value of TTRT, 
it is called an early token. If it captures the token after the TRT has exceeded 
the value of TTRT, it is called a late token. An early token may be used 
to transmit both synchronous and asynchronous traffic , while a late token 
may only be used for synchronous traffic. The difference between TTRT and 
TRT will be the available bandwidth for the asynchronous information. The 
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amount of time a station can transmit is limited by THT. 

In the following section we describe an image compression scheme which 
takes advantage of lighter loads on the network to provide side information 
to the receiver as asynchronous traffic. This side information is then used to 
increase the quality of the reconstructed image. 

3 Edge Correcting DP CM 

The proposed ADPCM system uses a two-bit Robust Jayant quantizer [4, 5]. 
This is a uniform quantizer whose step-size A{k) is adapted based on the 
previous quantizer output level H{k - 1) according to the following recursion 

16 ] 

A(fc) = - l))A(fc - 1)]^ 

where /3 = 1 - e^ e 0, and M(l) = 0.8, M(2) = 1.6, H{k) = 1 if 
the output falls into the inner levels of the quantizer and H{k) = 2 if the 
output is one of the outer levels of the quantizer. As the information about 
which level of the quantizer was used in the previous sample, is available to 
both the transmitter and the receiver, the adaptation does not require the 

transmission of any side information. 

The Jayant quantizer is designed to track the variance of the quantizer 
noise by changing the step size A{k). Since edges are regions where the 
statistics change rapidly, it follows that the step size will expand repeatedly 
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when an edge is encountered. This fact is made use of in the following rule 
to detect edges; 

An edge is detected when the step size of the Jayant quantizer 
expands more than P times in succession, P i 1. 

The value of P should be small to reduce detection delay; a value of 
two seems to work well. As both transmitter and receiver have the same 
information both transmitter and receiver will detect edges at the same time. 
Once an edge has been detected the proposed scheme uses an embedded 
quantizer to quantize the quantization error and transmit this value over 
a side channel. The use of an embedded quantizer was first proposed by 
Goodman and Sundberg [7] for use over a noisy channel. In [1] the issue of 
how a side channel could be configured was left open. We address this issue 
in the context of token ring networks in the following section. 

4 ADPCM and the Token Ring Network 

As mentioned earlier, the traffic in the token ring network is divided into syn- 
chronous and asynchronous traffic. We use the regular ADPCM output as the 
synchronous traffic and the output of the embedded quantizer as the asyn- 
chronous traffic. Thus the side channel simply consists of the asynchronous 
traffic. The reasoning behind this approach is that the system cannot afford 
to lose the regular ADPCM output which also has timing constraints. The 
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side information is not as critical, because the image can be reconstructed at 
the receiver without the side information, albeit with some degradation. 

In the analysis of a token protocol, it is generally zissumed that the queues 
of asynchronous messages to be sent are heavily loaded, so that messages are 
always available for transmission. In our case the asynchronous information 
queue will not be heavily loaded because the side information needs to be 
sent only when there is an edge. 

The size of the packet for synchronous traffic is fixed. Whenever the 
node captures an early token, the size of the packet will be increased to 
match the available capacity and the regular information followed by the 
side information, if present, will be sent. The most recent side information 
will be transmitted in the bandwidth available for asynchronous traffic. If 
there is any side information left after transmission, it will be discarded. 

Whenever the receiver receives an increased size packet it takes the bits 
received after the regular size of the packet as side information. This side 
information is added to the corresponding most recent ’’edge” pixels. 

5 Simulation of Proposed Scheme 

A fifty node token ring network was simulated to test the proposed system. 
The parameters used in the simulation are given in Table 1. 

The system is assumed to work under the following general conditions 

• The packet arrival process at each node follows a Poisson distribution. 
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Number of nodes 

50 

Bit traveling speed 

200,000 met/msec 

Distance between nodes 

100 meters 

Data generation rate 

11,000 bits/sec 

Packet size for synchronous information 

1540 bits 

Time taken by node to read the data 

lOlisec 

Channel capacity of coaxial cable 

12,000 bits/msec 


Table 1: Simulator parameters 


The actual image information is taken at node 1 with regular AD PCM 
output arriving into one buffer, and the side information into a separate 

buffer. 

• The message transmitted transmitted by each station belong to two 
classes, i.e. asynchronous and synchronous messages. 

• The access mechanism is based on the timed token approach, but dif- 
ferent classes of eisynchronous messages are not considered. 

• The queues of asynchronous messages are not heavily loaded. 

• When the network is initialized, the token rotation will only allow the 
transmission of synchronous messages; the second token rotation will 
allow both synchronous and asynchronous messages. 

Two types of simulations were performed. 
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1. Messages transmitted at all nodes consisted of both synchronous and 
ZLsynchronous messages. 

2. Only synchronous messages were transmitted at all nodes. 

Load versus delay and throughput versus delay characteristics were plot- 
ted for both cases and are shown in figures 1 and 2. Load is defined as the 
inverse of the mean inter-arrival time A. The graph in Figure 1 shows that 
at a particular value of the load, the average delay of a packet m the net- 
work with both classes of traffic is more compared to when only synchronous 
messages are transmitted. This is especially true at low loads; as the traffic 
increases there is not much difference in the delay for the two cases. This 
is because the network will not have enough bandwidth available for asyn- 
chronous traffic when the traffic is busy. 

The token ring network transmitting both synchronous and asynchronous 
messages provides better delay versus throughput characteristics. Here again 
at large values of throughput, there is not much difference between the curves. 
The reason for the better throughput versus delay characteristics is that at 
low loads, the network can utilize the channel more efficiently by transrmtting 
asynchronous messages whenever the bandwidth becomes available. 

The two images shown in Figure 3 were used to test the proposed approach 
and the results obtained at different network loads are shown in Table 3 and 
Table 2. 

The first two entries in these tables were obtained by operating the net- 
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Run 

Load 

Delay 

msec 

Throughput 

Rate 

bpp 

PSNR 

dB 

1 

.226 

177.3 

0.8033735 

2.011 

33.33 

2 

.185 

147.8 

0.8033297 

2.014 

33.36 

3 

.156 

123.3 

0.8032793 

2.057 

34.76 

4 

.136 

105.3 

0.8032306 

2.126 

35.69 

5 

.119 

89.19 

0.8030134 1 

2.221 1 

36.05 

6 

.107 

73.17 

0.8019965 

2.223 

37.22 

7 

.097 

58.35 

0.8000950 

2.237 

37.59 

8 

.085 

42.82 

0.7967703 

2.238 

37.62 

9 

.081 

38.87 

0.7956204 

2.238 

37.62 


Table 2; Results obtained at different network loads for couple image 


Run 

Load 

Delay 

msec 

Throughput 

Rate 

bpp 

PSNR 

dB 

1 

.254 

194.1 

0.8033794 

2.002 

29.13 

2 

.169 

135.3 

0.8033345 

2.016 

29.22 

3 

.156 

123.6 

0.8033151 

2.039 

29.32 

4 

.145 

114.4 

0.8033025 

2.075 

29.78 

5 

.127 

97.5 

0.8032005 

2.1831 

30.68 

6 

.107 

70.3 

0.8019068 

2.227 

30.90 

7 

.092 

51.8 

0.7989234 

2.237 

31.02 

8 

.081 

38.4 

0.7954938 

2.237 

31.02 


Table 3: Results obtained at different network loads for aerial image 
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work at high loads which is in the unstable region. At these high loads almost 
every node will have a packet to send, and there was no bandwidth available 
for side information. As the load was decreased, more and more side informa- 
tion was transmitted, providing a better reconstructed image at the receiver. 

In this simulation, at a load of around 0.09, there is enough bandwidth avail- 
able for node 1 to transmit all the side information. Further reduction of the 
load did not have any effect on the quality of the reconstructed image. 

Error images for the couple image were obtained at four different net- 
work loads and are shown in Figure 4. The error image without any side 
information is shown in Figure 4a for comparison. For the image shown in 
Figure 3b, side information was sent in the areas of the woman’s hands, the 
woman’s left knee and in some portions of the couples heads. In figure 4c 
the edge errors are corrected in the region of the womans hands, the man s 
shoulder, the photo frame, and the couple’s heads. Some of the edge errors 
at the man’s legs are also corrected. But in this case edge errors are present 
at the woman’s left knee. In the image shown in Figure 4d all edge errors are 
corrected except a few errors at the intersection of the man’s leg and chair. 
For the image shown in Figure 4e all side information was transmitted. 

6 Conclusion 

We have presented a low complexity scheme which can be used for trans- 
mitting images over local area networks. Because of its low complexity the 
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scheme can be operated at high rates and may be suitable for applications 

which require low delay. 
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Abstract 

Wt examine the situation where there is residual redundancy at the source coder output. We 
have previously shown that this residual redundancy can be used to provide error correction without 
a channel encoder. In this paper we extend this approach to conventional source coder/convolutional 
coder combinations. We also develop a design for nonbinary encoders for this situation. We show 
through simulation results that the proposed systems consistently outperform conventional source- 
channel coder pairs with gains of greater than lOdB at high probability of error. 
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1 Introduction 

One of Shannon’s many fundamental contributions was his result that source coding and channel 
coding can be treated separately without any loss of performance for the overall system [1]. The 
basic design procedure is to select a source encoder which changes the source sequence into a series 
of independent, equally likely binary digits followed by a channel encoder which accepts binary 
digits and puts them into a form suitable for reliable transmission over the channel. However, the 
separation argument no longer holds if either of the following two situations occur: 

i. The input to the source decoder is different from the output of the source encoder, which 
happens when the link between the source encoder and source decoder is no longer error free, 
or 

ii. The source coder output contains redundancy. 

Case (i) occurs when the channel coder does not achieve zero error probability and case (ii) 
occurs when the source encoder is suboptimal. These two situations are common occurrences in 
practical systems where source or channel models are imperfectly known, comple.xity is a serious 
issue, or significant delay is not tolerable. Approaches developed for such situations are usually 
grouped under the general heading of joint source/ channel coding. 

Most joint source channel coding approaches can be classified in two main categories; (A) 
approaches which entail the modification of the source coder/decoder structure to reduce the effect 
of channel errors, [2-18] and (B) approaches which examine the distribution of bits between the 
source and channel coders [19-21]. The first set of approaches can be divided still further into two 
classes. One class of approaches examines the modification of the overall structure [2-10], while the 
other deals with the modification of the decoding procedure to take advantage of the redundancy 


in the source coder output. 
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In this paper we present an approach to joint source/channel coder design, which belongs to 
category A, and hence we explore a technique for designing joint source/channel coders, rather 
than ways of distributing bits between source coders and channel coders. We assume that the 
two nonideal situations referred to earlier are present. For a nonideal source coder, we use MAP 
arguments to design a decoder which takes advantage of redundancy in the source coder output to 
perform error correction. ^A/e have previously shown that this approach can provide error protection 
at high error rates [16, 17]. In this paper we show that the use of such a decoder in conjunction 
with a channel encoder can provide excellent error protection over a wide range of channel error 
probabilities. We then use the decoder structure to infer a channel encoder structure which is 
similar to a nonbinary convolutional encoder. 

2 The Design Criterion 

For a discrete memoryless channel (DMC), let the channel input alphabet be denoted by A = 
{ao,ai,...,aM-i,}, and the channel input and output sequences hy Y = {yo,yi,.--, yi-i} and 
Y = respectively. IfA = {A;} is the set of sequences A,- = 

ai,jtcA, then the optimum receiver (in the sense of maximizing the probability of making a correct 
decision) maximizes P[C], where 

p[C] = '£p[c\y]p[y]. 

Ai 

This in turn implies that the optimum receiver maximizes P[C|y]. When the receiver selects the 
output to be Ak, then P[C\Y] = P[Y = Ak\Y]. Thus, the optimum receiver selects the sequence 
Afc such that 

P[Y = Ak\Y] > P[Y = A,iy] Vi. 
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Noting that 


P{Y\Y = 


P{Y\Y)P{Y) 

P(Y) 


and for fixed length codes P{Y) is irrelevant to the receiver’s operation, the optimal receiver 
maximizes P{Y\Y)P{Y). If we impose a first order markov assumption on {y,}, we can easily 
show that 

P (y|y) P(y) = l[P{yi\yi)P{yi\yi-i) 


This result addresses the situation in case (ii), i.e., the situation in which the source coder output 
(which is also the channel input sequence) contains redundancy. Using this result, we can design 
a decoder which will take advantage of dependence in the channel input sequence. The physical 
structure of the decoder can be easily obtained by examining the quantity to be maximized. The 
optimum decoder maximizes P{Y\Y)P{Y'j or equivalently logP(yjy)P(y), but 

logP(y|y)P(y) = 'Z^ogP{yi\yi)P{yi\yi-i) (1) 

which is similar in form to the path metric of a convolutional decoder. Error correction using 
convolutional codes is made possible by explicitly limiting the possible codeword to codeword 
transitions, based on the previous code input and the coder structure. At the receiver the decoder 
compares the received data stream to the a priori information about the code structure. The output 
of the decoder is the sequence that is most likely to be the transmitted sequence. In the case where 
there is residual strucure in the source coder output, the structure makes some sequences more 
likely to be the transmitted sequence, given a particular received sequence. In other words, even 
when there is no structure being imposed by the encoder, there is sufficient residual structure in 
the source coder output that can be used for error correction. The structure is reflected in the 
conditional probabilities, and can be utilized via the path metric in (1) in a decoder similar in 
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structure to a convolutional decoder. However, to implement this decoder we need to be able to 
compute the path metric. 

Examining the branch metric, we see that it consists of two terms log P(yi\y{) and log P(yi\yi-i). 
The first term depends strictly on our knowledge of the channel. The second term depends only on 
the statistics of the source sequence. In our simulation results we have assumed that the channel 
is a binary symmetric channel with known probability of error. We have obtained the second term 
using a training sequence. 

In [17] we showed that the use of the decoder led to dramatic improvements under high error 
rate conditions. However at low error rates the performance improvement was from nonexistent 
to minimal. This is in contrast to standard error correcting approaches, in which the greatest 
performance improvements are at low error rates, with a rapid deterioration in performance at 
high error rates. In this work we combine the two approaches to develop a joint source channel 
codec which provides protection equal to the standard channel encoders at low error rates while 
also providing significant error protection at high error rates. 

3 Convolutional Encoders and Joint Source/ Channel Decoder 

As convolutional coders provide excellent error protection at low error rates, and have a decoder 
structure similar to the JSC decoder, one way we can combine the two approaches is to obtain 
the transition probabilities of the convolutional encoder output and use the Joint Source/Channel 
(JSC) decoder described above instead of the conventional convolutional decoder. 

The convolutional decoder uses the structure imposed by the encoder and the Hamming metric 
to provide error protection. The decoder does not use any of the residual structure from the source 
coder output. We can make use of the residual structure by noting that the path labels transmitted 
by the convolutional encoder comprise the channel input alphabet {y,}. We can then use a training 
sequence to obtain the transition probabilities and an estimate of the channel error 
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probability to obtain F(y,|j/,) . These can be used to compute the branch metric L which can be 
used instead of the Hamming metric in the decoder. 

We simulated this approach using a two bit DPCM system as the source encoder. We used the 
two images shown in Figure 1 as the source. The USC Girl image was used for training (obtaining 
the requisite transition probabilities) and the USC Couple image for testing. The output of the 
DPCM system was encoded using a (2,1,3) convolutional encoder with connection vectors 

5 ( 1 ) = 64 5 ( 2 ) _ 74 

The convolutional encoder was obtained from [23]. The performance of the different systems was 
evaluated using two different measures. One was the reconstruction signal-to-noise ratio (RSNR) 
defined as 

R5iVil= 10 logic 

where U{ is the input to the source coder (source image) and u,- is the output of the source decoder 
(reconstructed image). The other performance measure was the decoded error probability. The 
received sequence was decoded using either a standard convolutional decoder or the JSC decoder. 
A block diagram of the system is shown in Figure 2. The results are presented in Figure 3. While 
there is some improvement in the decoded error probability for high error rates, the RSNR actually 
goes down for the MAP decoded sequence. This is somewhat disappointing until one realizes that 
the JSC decoder makes use of the structure in the nonbinary output of the source coder. When 
we used the ( 2 , 1 , 3 ) coder we destroyed some of this structure because the source coder puts out 
two bit words while the channel coder codes the input one bit at a time. Therefore, if we could 
preserve the structure in the source coder output by coding the two bit words as a unit we should 
get improved performance. To verify this we conducted another set of simulation with a rate 1/2 
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(4,2,1) convolutional code with connection vectors 

= 6 ffp' = 0 = 6 ?1 = 4 

S?>=0 # = 6 ,W = 2. 

In this case there is a one-to-one match between the source coder output and the channel coder 
input, and the results shown in Figure 4 reflect this fact. There is considerable improvement in 
the decoded error probability and there is about a 5 dB improvement obtained by using the MAP 
decoder at a probability of error of 0.1. These results justify the contention that for best use of 
the JSC decoder the input alphabet size of the channel coder should be the same as the size of 
the output alphabet of the source coder. To this point we have been using a MAP decoder with 
an encoder designed to maximize performance with a Bamming metric. In the next section we 
propose a general channel coder design to go with the map decoder which has the added flexibility 
of being able to match the size of the source coder output alphabet. 

4 A Modified Convolutional Encoder 

Given that the preservation of the structure in the source coder output requires the channel coder 
input alphabet to have a one-to-one match with the generally nonbinary source coder, we propose 
a general nonbinary convolutional encoder (NCE) whose input alphabet has the requisite property. 

Let In, the input to the NCE, be selected from the alphabet A = {0, l,2,...,iV - 1}, and let 
y„, the output alphabet of the NCE, be selected from the alphabet S = {0, 1, 2, ..., Af - 1}. Then 
the proposed NCEs can be described by the following mappings 

Rate 1/2 NCE: M = y„ = Nxn-i + 

Rate 1/3 NCE: M = y„ = N^Xn -2 + Nxn-i + in 
Rate 2/3 NCE; M = N^] Vn = iV^i 2 n -2 + iVi 2 n-i + i 2 » 
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Because of lack of space we will only describe and present the results for the rate 1/2 NCE. 
The description and results for the other cases can be found in [24] and are similar to the results 
for the rate 1/2 NCE code. 

The number of bits required to represent the output alphabet of the NCE codes using a fixed 
length code is 

[log 2 (M)l = flogiCN^)] = r21og2(JV)l 

Therefore in terms of rate, the rate 1/2 NCE coder is equivalent to a rate 1/2 convolutional encoder. 
The encoder memory in bits is 2 [log 2 (iV)l as each output value depends on two input values. 

As an example, consider the situation when iV = 4. Then A = {0, 1,2, 3} and S = {0, 1, 2, ..., 15}. 
Given the input sequence Xn : 0130211033 and assuming the encoder is initiali 2 ed with 
zeros, the output sequence will be : 0 1 7 12 2 9 5 4 3 15. 

The encoder memory is four bits. Notice that while the encoder output alphabet is of size 
at any given instant the encoder can only emit one of N different symbols as should be 
the case for a rate 1/2 convolutional encoder. For example if y„_i = 0, then will take on a 
value from {0, 1,2, - 1)}. In general, given a value for y„_i, y„ will take on a value from 

{aN,aN + l,aN + 2,...,aN + A - 1}, where a = This structure can be used by the 

decoder to provide error protection. The encoder is shown in Figure 5. 

4.1 Binary Encoding of the NCE Output 

We will make use of the residual structure in the source coder output (which is preserved in the NCE 
output) at the receiver. However, we can also make use of this structure in selecting binary codes 
for the NCE output. An intelligent assignment of binary codes can improve the error correcting 
performance of the system. 

When each allowable sequence is equally likely, there is little reason to prefer one particular 
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SiSsignniGnt over others. However, when certain sequences are more likely to occur than others, it 
would be useful to make assignments which increase the ‘distance’ between likely sequences. While, 
for small alphabets it is a simple matter to assign the optimum binary codewords by inspection, 
this becomes computationally impossible for larger alphabets. We use a rather simple heuristic 
which, while not optimal, provides good results. 

Our strategy is to try to maximize the Hamming distance between codewords that are likely 
to be mistaken for one another. First we obtain a partition of the alphabet based on the fact that 
given a particular value for yn— ii 2/n can only take on values from a subset of the full alphabet. 
To see this, consider the rate 1/2 NCE; then the alphabet 5 can be partitioned into the following 
sub- alphabets: 

5o = (0, 1,2,3..., IV -1) 

Si=(N,N + l,...,2N-l) 

Sn-i = [n{N - l),N{N + - l) 

where the encoder will select letters from alphabet Sj at time n if j = y„_i(modiV). Now for 
each sub-alphabet we have to pick N codewords out of M (= N^) possible choices. We first pick 
the sub-alphabet containing the most likely letter. The letters in the sub-alphabet are ordered 
according to their probability of occurrence. We assign a codeword a from the list of available 
codewords to the most probable symbol. Then, assign the complement of a to the next symbol on 
the list. Therefore the distance between the two most likely symbols in the list is K = [logj M] 
bits. We then pick a codeword b from the list which is at a Hamming distance of Kj2 from a and 
assign it and its complement to the next two elements on the list. This process is continued with 
the selection of letters that sxeK 12^ away from a at the step until all letters in the subalphabet 
have a codeword assigned to them. 
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The proposed approach was simulated using the same setup as was used in the preceding simula- 
tions. We show the results for the rate 1/2 NCE coder in Figure 6 and comparisons in Figure 7. 
Note the dramatic improvement in performance with the rate 1/2 NCE code. At a probability of 
error of 0.1 the RSNR drops by less than 1 dB. For the same channel conditions the use of the 
(2,1,3) code results in a drop of more than 10 dB. looking at the decoded error probabilities, even 
when the channel error probability is 0.25, the decoded error probability is less than lO’^ This 
improvement has been obtained with only a minimal increase in complexity. Similar results have 

also been obtained for rate 1/3 and 2/3 NCE codes. 

5 Conclusion 

If the sonrce and channel coder are designed in a “joint” manner, that is the design of each takes 
into account the overall conditions (source as well as channel statistics), we can obtain excellent 
performance over a wide range of channel conditions. In this paper we have presented one such 
design. The resulting performance improvement seems to vaUdate this approach, with a “Battening 
out” of the performance curves. This Battening out of the performance curves makes the approach 
useful for a large variety of channel error conditions. 
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Table 1: Codeword Assignments 


Symbol 

Code 

Symbol 

Code 

0 

0000 

8 

1011 

1 

0011 

9 

0111 

2 

1100 

10 

0100 

3 

1111 

11 

1000 

4 

1110 

12 

0101 

5 

1101 

13 

1001 

6 

0001 

14 

1010 

7 

0010 

15 

0110 
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Figure 3a. Decoded error probability for the Figure 3b. RSNR for the (2,1,3) convolutional coder 

(2,1,3) convolutional coder. 



Figure 4a. Decoded error probability for the Figure 4b. RSNR for the (4,2,1) convolutional coder 

(4,2,1) convolutional coder 
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Figure 5* Rate 1/2 Nonbinary Convolutional Encoder 
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Figure 6a. Decoded vs channel error for rate 1/2 NCE Figure 6b. RSNR for rate 1/2 NCE coder vs channel error 



Figure 7a. Decoder error probability for the three Figure 7b. RSNR vs channel error for the three MAP 

MAP decoded systems decoded systems 







